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Abstract 

This  research  considers  the  optimal  allocation  of  weapons  to  a  collection  of  targets 
with  the  objective  of  maximizing  the  value  of  destroyed  targets.  The  weapon-target 
assignment  (WTA)  problem  is  a  classic  non-linear  combinatorial  optimization  prob¬ 
lem  with  an  extensive  history  in  operations  research  literature.  The  dynamic  weapon 
target  assignment  (DWTA)  problem  aims  to  assign  weapons  optimally  over  time  using 
the  information  gained  to  improve  the  outcome  of  their  engagements.  This  research 
investigates  various  formulations  of  the  DWTA  problem  and  develops  algorithms  for 
their  solution.  First,  a  two  stage  stochastic  WTA  problem  is  explored  which  assumes 
independence  of  the  two  stages.  Next  a  two  stage  shoot-look-shoot  (SLS)  formula¬ 
tion  is  explored  in  which  the  second  stage  targets  are  dependent  on  the  first  stage 
allocations.  A  novel  multi-stage  DWTA  formulation  is  then  presented  in  which  kill 
probabilities  are  dynamic  and  dependent  on  the  current  set  of  targets.  Finally,  an  em¬ 
bedded  optimization  problem  is  introduced  in  which  optimization  of  the  multi-stage 
DWTA  is  used  to  determine  optimal  weaponeering  of  aircraft. 

Because  of  its  flexibility  and  applicability  to  sequential  optimization  problems,  ap¬ 
proximate  dynamic  programming  is  applied  to  the  various  formulations  of  the  WTA 
problem.  Like  many  in  the  field  of  combinatorial  optimization,  the  DWTA  prob¬ 
lem  suffers  from  the  curses  of  dimensionality  and  optimality  is  often  computationally 
intractability.  As  such,  approximations  are  developed  which  exploit  the  special  struc¬ 
ture  of  the  problem  and  allow  for  efficient  convergence  to  high-quality  local  optima. 
Finally,  a  genetic  algorithm  solution  framework  is  developed  to  test  the  embedded 
optimization  problem  for  aircraft  weaponeering. 
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APPROXIMATE  DYNAMIC  PROGRAMMING 
FOR  MILITARY  RESOURCE  ALLOCATION 

I.  Introduction 


1.1  Background 

The  weapon-target  assignment  (WTA)  problem  is  a  classic  resource  allocation 
problem  in  the  held  of  military  operations  research  where  the  objective  is  to  optimally 
assign  M  weapons  to  N  targets  such  that  the  expected  remaining  target  value  is 
minimized  (or  total  expected  destroyed  target  value  is  maximized).  Because  of  its 
applicability  to  numerous  issues  facing  military  analysts,  such  as  ballistic  missile 
defense,  air-to-ground  operations,  and  integrated  air  defense  systems  (IADS),  this 
problem  continues  to  be  of  significant  operational  importance.  Additionally,  because 
of  the  variety  of  formulations  and  the  extreme  complexity  of  each,  the  WTA  problem 
is  also  significant  in  the  theoretic  realm. 

The  WTA  problem  was  first  formally  posed  in  1958  [63],  and  is  known  to  be 
NP-complete  [60].  Since  then,  much  research  has  been  done  which  provides  exact 
(optimal)  or  heuristic  (not  provably  optimal)  solutions  for  a  variety  of  instances  of 
the  WTA  problem. 

Though  it  can  be  found  under  many  names,  two  specific  types  of  WTA  problem 
are  found:  static  and  dynamic.  In  the  static  WTA  (SWTA)  problem  all  information 
is  known  a  priori  and  all  allocations  are  made  at  one  time.  The  dynamic  WTA 
(DWTA)  problem  may  take  many  forms,  though  the  underlying  structure  of  each  of 
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these  is  a  sequential  decision  process.  In  the  DWTA,  at  stage  t,  weapons  allocations 
must  be  made,  the  outcome  of  which  impacts  the  future  state  space. 

In  both  cases  (SWTA  and  DWTA),  there  is  a  single-shot  probability  of  kill  for  a 
given  weapon-target  assignment.  For  the  SWTA,  the  stochastic  nature  of  the  problem 
is  handled  using  simple  expectations  of  the  outcomes.  However,  for  many  of  the 
DWTA  formulations,  future  stages  present  an  additional  stochastic  element  where 
the  variance  of  each  outcome  significantly  impacts  future  decisions.  As  such,  the 
DWTA  maintains  increased  complexity  for  which  few  solution  techniques  exist. 

For  deterministic  problems  with  static  resources  and  requirements,  numerous 
methods  exist  for  efficient  search  of  the  solution  space.  There  are  several  cases  where 
optimality  has  been  proven,  each  under  simplifying  assumptions.  As  many  practical 
problems  are  stochastic  and  dynamic  in  nature,  most  traditional  methods  fall  short. 
Additionally,  as  the  number  of  weapons  and  targets  increase,  the  state,  decision,  and 
outcome  spaces  within  a  dynamic  programming  framework  increase  exponentially. 
These  are  known  as  dynamic  programming’s  curses  of  dimensionality  [82],  Much  of 
the  existing  research  focuses  on  solution  techniques  for  the  static  problem  in  lieu  of 
the  more  complex,  and  practical,  dynamic  formulation.  Therefore,  it  is  important  to 
develop  methodologies  which  can  handle  this  sequential  decision  process  efficiently 
while  still  providing  high-quality  solutions. 

1.2  Motivation 

Analysts  at  the  Air  Force  Research  Laboratory  (AFRL)  are  developing  a  simula¬ 
tion  framework  in  which  future  weapons  concepts  may  be  tested  prior  to  development. 
As  part  of  their  framework  they  must  analyze  the  effect  a  specific  mix  of  weapons 
may  have  against  specific  IADS  scenarios.  Their  current  methodology  steps  forward 
in  time,  randomly  selecting  a  weapons  employment  strategy  until  the  aircraft  is  de- 
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stroyed.  At  this  point,  the  simulation  steps  back  to  the  last  time  the  aircraft  was 
alive  and  tries  a  different  tactic.  This  process  is  repeated  until  the  aircraft  makes  it 
through  the  whole  simulation  and  the  full  policy  is  recorded  as  a  possible  solution.  A 
set  of  candidate  solutions  are  then  selected,  simulated  repeatedly,  and  statistics  are 
collected.  This  methodology  is  generally  inefficient,  especially  given  the  sequential 
nature  and  complexity  of  the  embedded  assignment  problem. 

The  objective  of  the  AFRL  research  effort  is  to  optimize  a  mix  of  weapons  to 
inform  acquisition  of  future  systems  while  examining  any  synergistic  effects  kinetic 
and  directed  energy  weapons  may  have  together.  Because  the  target  set  is  assumed 
to  be  an  IADS,  weapons’  capabilities  will  likely  change  as  targets  are  destroyed. 
Currently,  there  is  no  formulation  in  the  literature  that  considers  probabilities  of  kill 
which  evolve  as  a  function  of  the  target  set.  Additionally,  within  the  simulation, 
weapon  assignments  should  consider  their  impact  on  the  evolution  of  the  system, 
instead  of  being  myopically  allocated.  Because  of  this,  a  dynamic  instance  of  the 
weapon  target  assignment  is  appropriate. 

To  optimize  the  set  of  weapons  used,  a  genetic  algorithm  (GA)  has  been  devel¬ 
oped  for  which  an  objective  (or  fitness)  value  must  be  computed  for  each  design 
point.  For  this  problem,  the  fitness  value  depends  on  the  allocation  and  capability 
of  each  weapon  being  investigated.  Few  efficient  allocation  strategies  are  present  in 
the  literature,  and  where  they  exist,  they  are  for  static  assignment.  Further,  within 
the  GA,  no  methodology  is  in  place  to  define  when  or  how  the  weapons  are  to  be 
fired.  The  sequence  of  how  the  weapons  are  fired  may  be  considered  in  the  design 
space,  impacting  the  size  of  the  space  to  be  searched.  As  an  alternative,  using  the 
sequential  solution  nature  of  dynamic  programming,  we  can  more  efficiently  search 
the  design  space,  by  providing  the  optimal  allocations  to  the  simulation.  Dynamic 
programming  has  the  flexibility  to  be  integrated  directly  within  the  simulation  by 
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yielding  an  efficient  policy  through  a  functional  approximation  given  the  state  of  the 
system. 

Because  of  the  many  complexities  of  the  motivating  problem,  both  a  theoretical 
advancement  of  provable  optimality  and  practical  application  are  necessary.  Further, 
gaps  in  the  current  literature  must  be  addressed  which  consider  dynamic  kill  proba¬ 
bilities,  the  large  decision  space  of  the  DWTA  problem,  and  the  embedded  nature  of 
the  GA  solution. 

1.3  Research  Contributions 

Though  it  is  a  classic  resource  allocation  problem,  the  weapon-target  assignment 
problem  is  still  of  interest  to  military  practitioners  and  academics  alike.  This  disser¬ 
tation  develops  numerous  solution  techniques  for  various  formulations  of  the  DWTA 
problem.  Specifically,  this  research  provides  the  following  contributions: 

•  Develop  an  adaptive  dynamic  programming  algorithm  which  optimally  solves  a 
two-stage  stochastic  WTA  problem  with  homogenous  weapons 

•  Extend  the  adaptive  dynamic  programming  method  to  a  shoot-look-shoot  (SLS) 
DWTA  problem  to  efficiently  provide  high-quality  solutions 

•  Formally  pose  the  cooperative,  multi-stage,  dynamic  weapon-target  assignment 
problem 

•  Use  of  order  statistics  to  reduce  the  size  of  the  allowable  decision  space  within 
a  dynamic  programming  solution  methodology 

•  Formulate  and  solve  an  embedded  optimization  problem  in  which  the  sequen¬ 
tial  allocation  of  weapons  to  targets  determines  item  utility  within  a  knapsack 
problem 
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•  Develop  a  genetic  algorithm  solution  framework  which  integrates  the  use  of 
ADP  to  determine  optimal  weapons  allocations  for  testing  within  a  simulation 

1.4  Paper  Structure 

The  remainder  of  this  dissertation  is  organized  into  six  chapters.  Chapter  II  con¬ 
tains  a  detailed  literature  review.  This  incorporates  both  a  survey  of  the  weapon 
target  assignment  problem,  followed  by  a  discussion  of  approximate  dynamic  pro¬ 
gramming  as  a  solution  methodology.  Chapter  III  provides  an  optimal  method  for 
a  two-stage  stochastic  WTA  problem.  Chapter  IV  extends  the  research  of  Chapter 
III  and  investigates  a  two  stage  shoot-look-shoot  formulation  of  the  WTA  problem 
where  the  second  stage  targets  depend  on  the  outcome  of  the  first-stage  assignments. 
Chapter  V  develops  an  approximate  value  iteration  methodology  through  the  use 
of  order  statistics,  and  Chapter  VI  describes  the  case  study  in  which  the  solution 
methodologies  are  integrated  within  a  general  simulation  framework  that  solves  a 
complex  embedded  optimization  problem.  Finally,  Chapter  VII  provides  conclusions, 
highlights  the  major  contributions,  and  provides  recommendations  for  future  research. 
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II.  Literature  Review 


2.1  Weapon- Target  Assignment  Problem 

The  weapon-target  assignment  (WTA)  problem  is  a  well  known  military  oper¬ 
ations  research  problem.  Though  the  static  WTA  was  initially  posed  formally  by 
Manne  [63]  as  a  special  case  of  the  transportation  problem,  it  was  first  informally 
posed  by  Merrill  Flood  at  The  Princeton  University  Conference  on  Linear  Program¬ 
ming  in  March  of  1957  as  similar  to  the  personnel  assignment  problem  [66].  Another 
item  of  interest  for  the  WTA  problem  as  shown  in  [63],  is  that  Dantzig  is  responsible 
for  the  formulation  that  is  widely  used  today.  Since  this  time,  substantial  research 
has  been  dedicated  to  determine  the  optimal  allocation  of  weapons  to  targets.  Two 
general  formulations  are  investigated  in  literature:  static  and  dynamic.  In  the  static 
formulation,  though  the  outcomes  of  the  assignments  are  stochastic,  all  information 
is  assumed  known  prior  to  making  the  assignment,  and  all  allocations  are  made  at 
one  time.  This  is  the  problem  posed  by  Manne  [63].  First  formulated  by  Hosein, 
Walton  and  Athans  [48] ,  the  dynamic  problem  has  similar  stochastic  elements  as  the 
static  problem,  but  assignments  are  made  in  multiple  stages.  Likely  clue  to  the  stan¬ 
dardized  formulation  of  the  problem,  the  static  WTA  (SWTA)  problem  is  the  most 
widely  researched  formulation  in  the  literature.  An  early  extension  of  the  problem  is 
given  by  Day  [28]  who  uses  a  three-stage  decomposition  technique  to  solve  a  weapons 
allocation  problem  by  relating  the  assignment  problem  to  that  of  decentralized  plan¬ 
ning  in  large  organizations.  Matlin  [66]  provides  the  first  survey  of  missile  allocation 
literature,  which  is  later  updated  by  Cai  et  al.  [42],  Eckler  and  Burr  [31]  also  provide 
numerous  examples  and  mathematical  models  of  missile  allocation  and  target  cover¬ 
age  problems.  The  WTA  problem  is  equivalently  postured  as  both  offensive,  where 
the  objective  is  to  maximize  the  damage  to  the  targets,  and  defensive,  where  the 


6 


objective  is  to  minimize  the  value  of  any  remaining  targets.  Other  formulations  also 
consider  an  asset-based  defense,  where  the  objective  is  to  minimize  damage  done  to  a 
set  of  assets  by  assigning  interceptors  to  incoming  adversarial  missiles  [16]  [102]  [101]. 

2.1.1  Static  Weapon- Target  Assignment  Problem. 

The  SWTA  is  formulated  as  follows.  Let  Vj  denote  the  value  of  the  jth  target,  Wi 
denote  the  number  of  available  weapons  of  type  i.  It  is  assumed  that  there  are  m 
weapon  types  and  n  targets.  Let  pl3  be  the  single  shot  probability  of  the  ith  weapon 
killing  the  jth  target,  such  that  the  single  shot  probability  of  survival  is  qij  =  1  —  Pij- 
The  decision  variable  is  the  number  of  weapons  of  type  i  assigned  to  target  j.  The 
defensive  SWTA  problem  is  then  formulated  as  a  nonlinear  integer  program: 

n  m 

min  Vj  (II  Qijj )  (2-1) 

j= i  *=i 

subject  to 


Xij  <  Wi  for  all  i  =  1,  2, . . .  m,  (2.2) 

3= i 

>  0  and  integer,  for  all  i  —  1,  2, . . .  m,j  —  1,  2, . . .  n.  (2.3) 

The  SWTA  was  shown  to  be  NP-complete  in  1986  by  Lloyd  and  Witsenhausen 
[60].  As  such,  much  research  has  been  done  in  the  past  several  decades  to  efficiently 
determine  optimal  solution  methods.  Two  optimal  solutions  exist  for  simplifying  as¬ 
sumptions  of  the  SWTA.  First,  given  a  homogeneous  weapon  set,  p^  =  pj  for  all 
i,  denBroeder  [30]  shows  optimality  is  achieved  by  evenly  distributing  the  weapons 
across  as  many  targets  as  possible  using  the  maximum  marginal  return  (MMR)  algo¬ 
rithm.  This  algorithm  assigns  weapons  sequentially  to  the  weapon  with  the  highest 
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remaining  expected  damage  value  until  all  weapons  have  been  allocated.  The  second 
instance  assumes  that  each  target  can  have  at  most  one  weapon  assigned  to  it  [24]  [74] . 

2.1.2  Current  Literature  of  the  Static  Weapon-  Target  Assignment 
Problem. 

Considering  any  one  specific  formulation,  the  majority  of  the  literature  has  been 
dedicated  to  efficiently  solving  the  SWTA  problem  formulation;  in  addition,  several 
papers  have  been  developed  since  the  2006  survey  by  Cai  et  al.  [42],  As  with  many 
NP-complete  or  other  combinatorial  optimization  problems,  the  existing  literature 
applies  a  wide  variety  of  methods  to  quickly  generate  high-quality,  but  generally 
subopt imal,  solutions.  Ahuja  et  al.  [5]  present  commonly  cited  results  and  give  a 
benchmark  for  solution  quality  through  lower  bounding  (for  the  minimization  prob¬ 
lem)  techniques.  Their  formulation  uses  integer  linear  programming  and  a  general 
integer  network  flow  problem  using  a  minimum  cost  flow  to  determine  a  new  lower 
bound  (if  minimizing).  The  authors  also  provide  a  very  large-scale  neighborhood  im¬ 
provement  heuristic  algorithm  which  quickly  solves  moderately  sized  instances  (up  to 
80  weapons  and  targets)  optimally  while  providing  high-quality  solutions  for  larger 
problems  (up  to  200  weapons  and  targets).  As  previously  discussed,  the  earliest  op¬ 
timal  methods  were  presented  by  denBroeder  [30]  under  a  homogenous  weapon  set 
assumption,  known  as  the  MMR  algorithm.  This  greedy  method  is  also  a  fast  method 
for  bounding  of  the  solution  when  the  homogeneous  weapons  assumption  has  been 
relaxed.  Chang  et  al.  [24],  and  Orlin  [74]  developed  optimal  methods  under  the  as¬ 
sumption  that  each  target  can  have  no  more  than  one  weapon  assigned  to  it.  These 
methods  exploit  the  underlying  network  flow  structure  of  the  SWTA  problem. 

Since  the  first  approximation  technique  for  the  SWTA  was  done  in  1966  [28],  a 
gamut  of  popular  metaheuristics  have  been  applied  to  the  SWTA  problem.  This  in- 


eludes  ant  colony  optimization  (AGO)  [57]  [88],  particle  swarm  [34]  [104]  (of  a  slightly 
more  generalized  resource  allocation  problem),  and  genetic  algorithms  (GAs)  [19]  [58] 
[49]  [61].  As  stand-alone  methods,  simulated  annealing  (SA)  and  tabu  search  are  two 
popular  heuristics  for  which  literature  gaps  appear  to  exist.  There  are,  however,  hy¬ 
brid  methods  used  to  provide  solutions  for  the  SWTA,  to  include  AGO  with  SA  [97], 
GA  with  AGO  [33],  GA  using  greedy  search  procedures  to  improve  the  quality  of  the 
offspring  [59],  and  particle  swarm  with  embedded  greedy  algorithms  [50].  Turan  [95] 
provides  a  comparison  of  several  heuristic  algorithms  for  the  WTA  problem  and  poses 
a  new  hybrid  algorithm  consisting  of  particle  swarm  and  random  search  to  produce 
higher- quality  solutions.  In  addition  to  these  popular  metaheuristic  methods,  several 
other  approximation  methods  have  been  used  for  the  SWTA.  Chen,  Ren,  and  Deng 
[26]  use  a  modified  MMR  type  algorithm  after  changing  the  network  representation 
from  a  one-to-many  to  a  one-to-one  mapping  to  efficiently  approximate  the  optimal 
value.  Rosenberger  et  al.  [85]  compares  the  sequential  application  of  the  auction  algo¬ 
rithm  in  a  greedy  fashion  to  an  exact  (but  computationally  expensive)  branching  and 
bounding  technique.  Sahin  and  Leblebicioglu  [62]  apply  fuzzy  reasoning  to  approx¬ 
imate  optimum  allocations  in  real-time  for  use  on  a  battlefield.  Lastly,  Lagrangian 
relaxation  [72]  was  used  to  decompose  the  problem  into  two  tractable  subproblems 
while  iteratively  updating  the  Lagrange  multipliers.  Of  the  extensive  amount  of  re¬ 
search  done  for  the  SWTA,  Ahuja  et  al.  [5]  appears  to  be  the  most  widely  accepted 
solution  which  solves  the  general  SWTA  problem.  Next,  the  more  complex  dynamic 
weapon  target  assignment  formulation  is  discussed,  followed  by  a  review  of  existing 
literature. 
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2.1.3  Dynamic  Weapon- Target  Assignment  Problem. 


The  DWTA  divides  the  total  duration  of  an  offensive  attack  into  several  discrete 
time  steps  in  which  information  is  obtained  about  the  allocation  outcomes  of  the 
previous  stages.  Any  targets  destroyed  during  a  stage  are  no  longer  targeted  in 
subsequent  stages,  allowing  the  operator  to  make  better  use  of  their  weapons.  The 
basic  assumptions  of  the  DWTA,  as  outlined  in  [47],  are  as  follows: 

•  In  each  stage,  a  subset  of  weapons  is  selected  and  committed  simultaneously. 

•  The  outcomes  of  each  stage  are  observed  prior  to  the  following  stage  (this  can 
either  be  perfect  knowledge  or  stochastic,  though  Hosein  [47]  assumes  perfect 
knowledge) 

Furthermore,  Hosein  and  Athans  [47]  methodology  obtains  solutions  by 

•  Re-solving  the  problem  at  each  stage  using  previous  stage  information 

•  Computing  the  optimal  assignment  for  the  current  stage  always  assumes  optimal 
assignments  will  be  made  in  subsequent  stages 

•  Selecting  weapons  at  each  stage  with  the  goal  of  optimizing  the  expected  sum 
of  realized  values  over  all  stages 

The  multi-stage  problem  as  formulated  in  [48]  is  as  follows.  Let  T  =  the  number 
of  time  stages,  M  =  the  number  of  weapons,  N  =  the  number  of  targets,  and  V)  = 
the  value  of  target  i  for  i  =  1,2, . . . ,  N.  Let  Pij(t)  =  the  single-shot  probability  of 
kill  if  weapon  i  is  assigned  to  target  j  in  stage  t,  i  =  1,2,...  M,  j  =  1,  2, ...  N, 
t  =  1,2,  ...T,  and  c/y  (f)  =  1  —  pl3{t)  be  the  corresponding  probability  of  survival. 
Define  the  decision  variables  x ^  as 
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Xij  ~ 


1,  if  weapon  i  is  assigned  to  target  j  in  stage  1 
0,  otherwise 


Next,  define  the  TV-dimensional  binary  vector  target  state  u  e  {0, 1}^  and  the 
M-dimensional  binary  vector  weapon  state  w  G  {0, 1}^,  where 


and 


{1,  if  target  j  survives  stage  1 

0,  if  target  j  is  destroyed  in  stage  1 


{1,  if  weapon  i  is  not  used  in  stage  1 
0,  if  weapon  i  is  used  in  stage  1 

Then,  for  any  initial  weapon-target  assignment,  x^,  u  is  an  TV-dimensional  random 
vector  at  the  start  of  the  second  stage  which  captures  the  outcomes  of  the  assignments. 
As  shown  in  [47],  the  distribution  of  the  u/ s  is 


M  f  M  ' \ 

ph = k] = A-nt1  -p<m)T” + a  -  b  -  na }  <2-4> 

i= 1  l  i=  1  J 

for  k  —  0, 1  and  j  —  1,  2, . . . ,  N.  Equation  2.4  determines  the  probability  with  which 
states  transition  over  time.  The  weapon  states  then  transition  over  time  using 


N 

Wj  =  1  -  Xij,  i  =  1,2,...  M.  (2.5) 

3= 1 

Define  F2*  (u,  w)  as  the  optimal  cost  of  a  T  —  1  stage  problem  given  an  initial 
state  (u,  w),  then,  because  this  is  defined  in  terms  of  T  two-stage  subproblems,  by 
recursively  using 


N 

ft+i(  u,w)  =  V]Uj, 

3=1 


(2.6) 
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the  DWTA  formulation  is: 


subject  to 


min  F\  —  P[u  =  w]F^ (oj,  w) 

Xij  t—* 

oj£{0,1}n 


(2.7) 


^■6  {0,1},  i  =  1,2, ...  M,  j  =  1,2, ...  IV,  (2.8) 

N 

with  Wi  =  1  —  (2.9) 

3= 1 

Here,  ca  is  the  random  outcome  based  on  the  current  stage  assignment.  In  words, 
the  objective  is  to  minimize  over  all  possible  second  stage  target  states  to  determine 
our  optimal  expected  second  stage  return.  Recursion  is  used  T  —  1  times  to  determine 
the  optimal  T-stage  weapon-target  allocation. 

Hosein  [47]  also  provides  a  list  of  important  properties  of  the  DWTA  and  that, 
given  these  properties,  obtaining  algorithms  which  efficiently  solve  this  problem  op¬ 
timally  is  unlikely.  These  characteristics  are: 

(a)  Dynamic  WTA  problem  is  NP- Complete 

(b)  DWTA  is  discrete  (and  integer)  -  fractional  weapon  assignments  are  not  allowed 

(c)  Dynamic  (and  sequential)  -  the  current  stage  outcomes  inform  future  decisions 

(d)  Nonlinear  -  with  a  convex  objective  function 

(e)  Stochastic  -  the  outcomes  of  the  assignments  are  probabilistic  in  nature 

(f)  Large-Scale  -  as  problem  size  increases,  enumeration  techniques  become  imprac¬ 
tical  or  computationally  intractable 

If  the  state  is  defined  such  that  it  consists  of  the  number  of  stages  remaining,  and 
the  current  weapon  and  target  states  (based  on  the  transitions  already  defined),  then, 
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structurally,  all  the  elements  necessary  to  classify  it  as  a  sequential  decision  process 
are  present.  Additionally,  since  the  current  decision  only  depends  on  the  current 
state  (which  captures  the  necessary  previous  information  to  make  our  decision),  the 
Markovian  property  is  satisfied  as  well.  As  such,  this  problem  is  ideally  suited  for 
solution  using  dynamic  programming.  However,  as  is  often  the  case  for  dynamic 
programming  problems,  as  the  state  space  increases,  so  does  the  decision  space,  as 
well  as  the  possible  outcomes  at  each  stage.  Therefore,  this  problem  suffers  from  the 
three  curses  of  dimensionality,  and  as  the  number  of  states  increase  (i.e.  the  number 
of  weapons  and  targets  increase),  traditional  solution  methods  become  intractable. 
These  three  curses  of  dimensionality  arise  from  exponential  growth  of  either  the  state 
space,  decision  space,  or  outcome  space,  or  their  combined  increase.  Before  moving  on 
to  a  discussion  of  mitigation  strategies  for  these  curses,  a  simplified  DWTA  problem 
consisting  of  only  two  stages  is  presented. 

2.1.4  Two-Stage  DWTA. 

As  a  simplification  to  the  multi-stage  problem  posed  by  Hosein  [48],  Murphey  [70] 
defined  a  two-stage  stochastic  programming  model  of  the  weapon  target  assignment 
problem.  In  this  model,  consider  the  probability  that  an  adversary  has  a  total  stock¬ 
pile  of  weapons  and  shoots  a  portion  of  them  in  the  first  stage,  with  the  remainder 
of  the  weapons,  known  to  a  probability  distribution,  arriving  in  the  second  stage. 
An  important  distinction  for  this  problem  is  that  the  second  stage  target  arrivals  are 
independent  of  the  first  stage  assignments.  Let  n \  targets  arrive  in  stage  1,  and  n 2 
targets  arrive  in  stage  2.  Let  the  random  vector  well  denote  the  number  of  targets 
in  the  second  stage.  Suppose  the  probabilities  of  survival,  q*,  and  target  values,  V), 
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for  each  target  i  are  given.  Then  the  2-stage  WTA  programming  formulation  is: 


Zi(x)  =  min;c  f^x)  +  Euea[Z2(x,  u3)] 
subject  to 

x  <  b 

Xi  e  N  +,i  =  1. ..  ,N 


(2.10) 


where 

ni 

i= 1 

is  the  first  stage  value  function  of  the  first  stage  assignment  x  and  is  integer  convex, 
Eujett[Z'2(x,  0Jr)\  is  the  expected  second  stage  value  and  is  integer  convex  (meaning  the 
relaxation  problem  is  convex)  where  oh  is  a  scenario  in  stage  2  and  is  solved  using 
the  MMR  algorithm,  xi  —  M  is  the  resource  capacity  constraint,  and  b  is  the 
vector  denoting  the  maximum  weapons  that  can  be  assigned  to  any  one  target. 

Z2(x,  oji)  is  the  solution  to  the  second  stage  problem  and  is  expressed  as: 

Z2{x,u3)  =  min  fi(y)  (2.11) 

y 

n(i)  U2(oj:>) 

subject  to  E  X  i  +  y  ,  lh  =  M, 

i— 1  i=l 

y  <  b 

y  E  Nn2 


where 


n  2(01) 

fi(y)  =  E 

i= 1 
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is  the  second  stage  value  function,  (y)  depends  on  the  outcome  of  u  and  is  integer- 
convex. 

Murphey  [70]  uses  a  decomposition  method  to  decouple  the  first  and  second  stage. 
The  decomposition  method  first  solves  a  variant  of  the  stage  1  problem  called  the 
current  problem: 


minX)0  fi(x)  +  6  (2.12) 

subject  to 

Ax  <  6, 
x  £  X 

(2.13) 


with  a  scalar  6  taking  the  place  of  the  second  stage  value  so  that 

o  >  y^y  z2(x,oj3) 

3=1 

where  s  is  the  total  number  of  scenarios  and  p7  is  the  probability  of  scenario  j  occur¬ 
ring. 

Murphey  [70]  uses  stochastic  decomposition  to  come  up  with  approximate  solu¬ 
tions  for  this  formulation. 

2.1.5  Other  Literature  of  the  Dynamic  Weapon-  Target  Assignment 
Problem. 

Though  it  has  not  been  researched  to  the  extent  of  the  SWTA  problem,  the  DWTA 
problem  provides  a  more  practical  implementation  by  including  a  temporal  compo¬ 
nent.  As  such,  the  DWTA  is  a  much  more  complex  problem  from  a  mathematical 
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standpoint  and  has  received  a  fair  amount  of  attention  in  the  literature.  Similar  to  the 
SWTA,  numerous  methods  have  been  employed  to  provide  solutions  for  various  types 
of  DWTA  problems.  As  the  originator  of  the  dynamic  instance,  Hosein  [47]  provides 
several  results  which  are  generalizable  to  the  DWTA  problem.  Murphey  [70]  [71]  uses 
stochastic  decomposition  for  the  two-stage  problem  previously  defined.  An  extension 
of  the  generalized  two-stage  problem  called  the  shoot-look-shoot  target  assignment 
problem  also  has  a  fair  amount  of  associated  literature,  but  will  be  discussed  in  the 
next  section.  Specific  to  the  general  DWTA  problem,  Chang  [24]  uses  a  static  WTA 
approximation  scheme  within  an  iterative  linear  network  flow  framework  to  efficiently 
provide  high-quality  solutions  for  the  DWTA.  Because  of  the  integrality  constraint  of 
the  decision  variables,  the  chromosome  representation  within  a  GA  presents  a  useful 
scheme  for  solving  both  the  static  and  dynamic  versions  of  the  WTA  problem.  As 
such,  much  work  has  developed  hybrid  GAs  to  assist  in  solving  the  DWTA.  Wu  et  al. 
[99]  apply  a  modified  GA  to  the  DWTA  and  introduces  weapon  use  deadlines  within 
the  problem  formulation.  These  deadlines  follow  the  principles  of  scheduling  theory, 
and  are  in  the  form  of  additional  constraints  such  that  a  weapon  has  to  be  shot  at  a 
target  by  a  specified  time  or  it  is  rendered  unusable.  The  authors  call  their  method 
a  modified  GA  because  it  applies  a  basic  GA  iteratively,  assigning  a  weapon  to  a 
target  (possibly  suboptimally)  immediately  before  the  deadline  is  reached.  Xin  et  al. 

[101]  develop  a  heuristic  which  uses  problem  information  (domain  knowledge)  and 
constraint  programming  to  assign  priorities  to  assignments.  Evolutionary  heuristics 
which  use  a  hybridized  GA  with  memetic  algorithms  have  also  been  applied  to  the 
DWTA  [25].  Additionally,  Khosla  [54]  applies  a  hybrid  heuristic  which  uses  a  simu¬ 
lated  annealing  (SA)  heuristic  to  determine  the  fitness  of  a  population  within  a  GA 
framework.  Other  heuristic  techniques  applied  to  the  DWTA  include  Tabu  Search 

[102] ,  AGO  with  tabu  table  updates  [103],  and  a  modified  Hungarian  method  with 
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PSO  [56].  Lastly,  exact  dynamic  programming  [91]  [89]  has  also  been  applied  to  the 
DWTA.  The  last  portion  of  the  WTA  literature  review  focuses  on  the  specific  shoot- 
look-shoot  scenario,  as  well  as  some  miscellaneous  WTA  formulations  and  solution 
methods  not  explicitly  for  WTA  fproblems. 

2.1.6  Other  Target  Assignment  /  Weapons  Allocation  Literature. 

Because  of  the  numerous  articles  dedicated  to  it,  methods  for  solving  the  specific 
shoot-look-shoot  (SLS)  problem,  as  well  as  some  other  miscellaneous  allocation  meth¬ 
ods,  are  now  discussed.  The  SLS  problem  is  a  dynamic  weapon  target  assignment 
problem  which  allows  for  multiple  allocation  stages  with  some  form  of  battle  damage 
assessment  after  assignments  are  made.  At  the  end  of  each  stage,  the  outcomes  of 
the  allocations  are  known  according  to  some  probability  distribution  prior  to  making 
the  subsequent  stage  allocations.  The  complexity  of  the  SLS  problem  is  the  depen¬ 
dency  of  future  stages’  target  sets  on  previous  weapon  assignments.  The  utility  of 
the  SLS  problem  is  that  it  demonstrates  the  impact  current  outcomes  have  on  initial 
weapons  allocations  with  the  knowledge  that  some  weapons  need  to  be  kept  for  future 
stages.  Additionally,  in  a  multi-stage  WTA  formulation,  a  myopic  SLS  policy  could 
be  implemented  at  each  stage  to  provide  a  bound  on  the  solution. 

Manor  and  Kress  [64]  prove  optimality  of  a  multi-stage  greedy  SLS  solution  against 
a  homogeneous  target  set  assuming  imperfect  damage  information.  They  also  show 
that  the  original  SLS  problem  is  equivalent  to  a  finite  horizon  deteriorating  ban¬ 
dit  problem,  which  dynamically  allocates  a  single  resource  amongst  a  fixed  number 
of  arms.  Aviv  and  Kress  [8]  evaluate  several  SLS  tactics  (such  as  the  persistent 
shooter,  fixed  bound  on  munitions  and  dynamic  bound  on  munitions)  and  analyzes 
their  efficiency  when  damage  information  is  uncertain  (or  incomplete).  Glazebrook 
and  Washburn  [35]  provide  a  brief  survey  of,  and  further  investigate  the  SLS  problem 
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considering  several  scenarios  in  which  information  may  be  perfect  or  imperfect,  the 
time  horizon  is  finite  or  infinite,  and  homogeneity  (or  non-homogeneity)  of  weapons 
is  considered.  They  approach  the  problem  as  a  partially  observable  Markov  deci¬ 
sion  process  (POMDP),  and  apply  dynamic  programming  citing  the  computational 
intractability  of  their  methods  as  problem  size  increases.  Yost  and  Washburn  [105] 
also  decompose  the  problem  into  a  linear  program  to  obtain  an  initial  (bound)  set  of 
policies  and  use  dynamic  programming  to  help  improve  the  policies.  The  dynamic 
programming  subproblem  is  also  viewed  as  a  POMDP,  as  in  [35].  Karasakal  [51] 
applies  integer  programming  decomposition  to  determine  SLS  policies  for  allocating 
surface-to-air  missiles  within  a  naval  task  group.  Castanon  [23]  approaches  the  SLS 
problem  as  a  two  stage  resource  allocation  where  the  goal  is  to  maximize  the  first 
stage  allocations  while  considering  the  second  stage  recourse  requirements.  The  for¬ 
mulation  then  takes  on  a  similar  form  to  that  of  the  two-stage  stochastic  control 
problem  defined  by  Murphey  [71],  and  is  similar  to  a  constrained  two-stage  form  of 
Bellman’s  equation  [10].  Linear  interpolation  and  Lagrangian  decomposition  are  then 
used  to  quickly  approximate  recourse  actions  (the  2nd  shooting  stage  in  the  SLS). 
These  values  are  then  used  recursively  to  greedily  solve  the  first  stage  problem. 

Lastly,  there  are  several  other  refereed  journal  articles  which  focus  on  areas  re¬ 
lated  to  the  WTA  assignment  problem,  but  are  generally  not  classified  as  such.  For 
brevity,  and  because  of  their  uniqueness,  each  paper  and  the  methodology  used  is  in¬ 
troduced,  but  it  is  left  to  the  reader  to  determine  their  specific  formulation.  Most  of 
these  papers  have  to  do  with  interceptor  allocation  specific  to  ballistic  missile  defense 
(BMD)  applications.  Gorfinkel  [40]  uses  decision  theory  to  maximize  the  probability 
that  a  warhead  is  hit  given  that  it  is  concealed  in  a  cloud  of  decoys.  Bracken,  Falk, 
and  Miercort  [17]  extend  a  model  introduced  by  Phipps  [78]  in  which  two  players  ex¬ 
change  nuclear  weapons.  In  this  model,  one  player’s  remaining  weapons  are  impacted 
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by  the  other  player’s  strike  package,  thus  the  constraints  of  the  second  player  become 
a  function  of  the  allocation  of  the  first  striker.  This  max-min  problem  is  shown  to  be 
separable,  but  nonconvex,  so  traditional  methods  do  not  guarantee  optimality  via  a 
saddle  point.  Instead,  piecewise  linear  approximations  are  used,  and  solved  via  branch 
and  bound.  The  linear  approximations  match  the  nonlinear  objective  function  at  a 
predefined  number  of  gridpoints;  as  the  number  of  gridpoints  increases,  the  approx¬ 
imation  becomes  better,  but  at  the  cost  of  computation  time.  Metier,  Preston,  and 
Hofmann  [68]  present  various  solution  techniques  for  five  different  defensive  weapons 
allocation  problems.  The  formulations  investigated  vary  from  a  generalized  DWTA 
problem,  to  a  single  threat-target  assignment  problem,  while  solution  methodologies 
include  linear  and  non-linear  programming,  branch  and  bound,  greedy  approximation 
and  others.  Wilkening  [98]  derives  the  size  of  defense  necessary  to  meet  defense  ob¬ 
jectives  based  on  target  kill  probability,  and  applies  it  to  national  and  theater  missile 
defense.  Bertsekas  et  al.  [16]  formulates  the  BMD  problem  as  a  Markov  decision  pro¬ 
cess  (MDP)  and  uses  neuro-dynamic  programming  where  the  cost-to-go  functional 
approximation  is  achieved  through  neural  network  architectures.  Brown  et  al.  [18] 
apply  a  two-sided  model  to  determine  the  optimal  location  to  pre-position  defensive 
platforms  with  the  objective  of  minimizing  the  eventual  damage  from  a  ballistic  mis¬ 
sile  attack.  Menq  et  al.  [67]  uses  discrete  Markov  decision  process  modeling  as  a 
means  for  providing  distribution  functions  for  BMD  so  that  more  accurate  planning 
and  cost  analysis  may  be  used  in  practical  settings.  Arslan,  Marden,  and  Shamma  [7] 
develop  a  game-theoretical  formulation  for  vehicle-target  assignment  in  which  a  set 
of  vehicles  cooperatively  assign  themselves  to  a  set  of  targets  to  optimize  some  utility 
function. 
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2.2  Approximate  Dynamic  Programming 


2.2.1  Dynamic  Programming. 

First,  the  the  major  concepts  and  assumptions  which  are  used  when  considering 
dynamic  programming  as  a  solution  methodology  are  briefly  introduced.  As  a  disci¬ 
pline,  dynamic  programming  is  a  “collection  of  mathematical  tools  used  to  analyze 
sequential  decision  processes”  [29].  As  discussed  in  Denardo  [29],  regardless  of  how 
unrelated  two  different  processes  may  seem,  there  are  several  underlying  components 
common  to  all  sequential  decision  processes.  Specifically,  at  each  decision  epoch,  the 
process  is  in  some  state,  the  goal  is  always  (at  least  it  should  be)  to  make  the  best  (or 
optimal )  decision  given  the  state  that  one  is  in,  and  finally,  that  based  upon  what  de¬ 
cision  is  made  at  that  given  point,  there  will  be  an  outcome  that  one  will  transition  to 
via  some  sort  of  functional,  or  transitional  equation.  It  is  also  assumed  that  the  deci¬ 
sion  will  either  incur  a  cost,  or  the  decision  maker  will  obtain  some  immediate  reward 
for  making  the  decision.  Once  a  transition  takes  place  and  the  cost  has  been  incurred, 
the  decision  maker  will  then  be  faced  with  an  updated  decision  and  the  process  will 
start  over.  One  critical  assumption  for  dynamic  programming,  however,  is  that  once 
the  transition  has  been  made,  what  happened  previously  is  entirely  captured  in  the 
new  state,  thus  future  decisions  do  not  depend  on  what  happened  in  the  past,  and 
only  depends  on  where  the  decision  maker  is  at  that  epoch.  This  is  also  known  as 
the  Markovian  property,  without  which  much  of  the  underlying  mathematics  would 
be  substantially  complex. 

As  previously  discussed,  for  a  generalized  dynamic  programming  problem,  several 
structural  elements  will  always  be  present.  Using  the  notational  conventions  of  Bert- 
sekas  [12]  [13]  [15],  they  are  defined  as  follows.  Let  k  be  the  index  of  either  a  discrete 
time  step,  or  a  discrete  decision  epoch  in  continuous  time.  Then  x *.  is  the  state  of  the 
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system  at  k,  and  contains  everything  necessary  to  make  a  decision,  u k.  In  stochastic 
cases  (discussed  further  later),  there  is  a  noise  element,  wk,  representing  a  random 
occurrence  or  outcome  which  may  be  based  on  the  state,  the  decision,  both,  or  nei¬ 
ther.  Finally,  consider  a  time  horizon  N  which  tells  the  point  at  which  to  terminate 
recursion  or  it  may  represent  the  number  of  decision  epochs.  As  will  be  discussed 
later,  this  horizon  may  be  finite  (N  <  oo)  or  infinite  N  =  oo.  The  transition  function 
for  the  stochastic  formulation  is  of  the  following  form: 

Xk+ 1  =  fk(xk,  uk,  tufc),  k  =  0, 1, . . . ,  N  -  1  (2.14) 

where  /(•)  is  a  function  defining  the  system  dynamics.  In  dynamic  programming, 
the  costs  (or  rewards)  are  also  assumed  to  be  additive,  meaning  that  at  each  decision 
epoch,  the  costs  incurred  up  until  the  point  are  represented  in  some  additive  form, 
and  will  be  added  to  future  costs.  The  cost  function  will  be  some  function  of  the 
state,  decision  and  random  outcome:  gk(xk,uk,wk).  Therefore,  given  a  terminal  cost 
for  being  in  the  final  state  r//v(xjv),  the  total  cost  over  time  is 


N-l 

9n(xn)  +  ^gk(xk,uk,wk)  (2.15) 

k= 0 

Additionally,  in  literature,  the  decision  uk  may  be  represented  as  a  function  of  the 
current  state,  or  uk(xk).  Similarly,  the  random  outcome  may  be  a  function  of  the  state 
and  the  decision  made,  as  in  the  case  of  weapons  allocation,  wk(xk,uk),  but  may  also 
be  itself  a  random  occurrence  (such  as  with  the  inventory  demand  example).  However, 
to  remain  consistent  with  Bertsekas,  this  notation  will  not  be  used.  In  addition,  a 
more  explicit  definition  of  these  elements  is  found  in  [82]  (pg  168). 

Dynamic  programming  problems  are  solved  using  Bellman’s  equation  [10]  (ex¬ 
tracted  from  [12]) 
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J*(x)  =  min  E{g(x,u,w)  +  J*(f(x,u,w))}  (2-16) 

u£U(x)  w 

This  holds  true  when  moving  forward  from  any  state  in  which  the  system  may  be. 
This  is  considered  the  principle  of  optimality,  and  (in  words)  states  that  there  exists 
an  optimal  solution  from  any  state  to  the  end  of  the  horizon.  This  is  a  powerful  fact, 
and  in  many  cases,  this  can  be  used  to  find  all  optimal  paths  (or  decisions)  from  any 
state  to  any  other  set  of  states.  One  example  of  this  is  value  iteration.  During  value 
iteration,  each  possible  state  is  iterated  over,  and  the  optimal  decision  is  determined. 
From  these  optimal  decisions,  transitions  to  a  new  state  will  occur,  with  which  value 
iteration  has  provided  the  optimal  decision. 

2.2.2  Introduction. 

Though  the  traditional  methods  for  dynamic  programming  are  very  powerful, 
they  fall  victim  to  computational  intractability  as  problem  size  increases;  the  curses 
of  dimensionality.  As  such,  means  of  mitigating  this  computational  inefficiency  must 
be  examined.  One  such  methodology  is  approximate  dynamic  programming.  Powell 
[82]  presents  numerous  examples  of  approximate  dynamic  programming  for  resource 
allocations  problems.  Because  of  the  underlying  sequential  structure  of  the  DWTA 
(and  the  possibility  for  selecting  assignments  in  a  sequential  nature  in  the  SWTA), 
small  instances  of  the  WTA  problem  may  be  solved  using  exact  methods.  However, 
as  shown  in  the  literature  reviewed  in  Section  2.1,  the  tractability  of  these  techniques 
decreases  as  problem  size  increases.  Similarly,  exact  dynamic  programming  suffers 
from  the  same  issues.  Coined  the  three  curses  of  dimensionality ,  for  many  solution 
methods  (value  iteration,  policy  iteration,  and  their  variants),  each  state,  decision, 
and  (if  present)  possible  outcome  (random  event  or  exogenous  information  process) 
need  to  be  iterated  over.  As  such,  computational  effort  increases  exponentially  as 
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problem  size  increases.  Several  texts  ([15]  [13]  [82])  are  dedicated  to  presenting  meth¬ 
ods  which  address  the  curses  of  dimensionality  of  dynamic  programs.  Powell  [82] 
states  that  all  dynamic  programs  are  able  to  be  written  in  terms  of  a  recursive  rela¬ 
tionship  relating  the  expected  cost  (or  reward)  of  being  in  a  given  state  at  a  point  in 
time,  to  the  expected  cost  (or  reward)  of  each  possible  future  state.  This  relation¬ 
ship  can  make  many  problem  sizes  increase  exponentially  as  a  function  of  the  state, 
decision,  or  outcome  spaces. 


A  tutorial  at  the  2013  Industrial  and  Systems  Engineering  Research  Conference 
given  by  Dan  Adelman  of  The  University  of  Chicago  Booth  School  of  Business  [1] 
provides  an  insightful  overview  of  the  available  approximate  dynamic  programming 
methods  used  to  date.  Figure  1  shows  this  hierarchy. 


Sutton  8  Barto  (1998) 


Figure  1.  Approximate  Dynamic  Programming  Methodologies 


Though  the  state-of-the-art  in  approximate  dynamic  programming  has  made  many 
advances  since  the  dates  provided  in  Figure  1,  such  as  the  2nd  edition  of  Powell’s 
Approximate  Dynamic  Programming  [82]  and  the  4th  edition  of  Bertsekas’  Dynamic 
Programming  and  Optimal  Control  [11]  texts,  Figure  1  gives  a  good  breakdown  from 
which  to  start.  Specifically,  it  is  differentiated  between  the  exact  and  simulation-based 


23 


approximation  methods.  It  is  important  to  note  that  in  each  case,  methods  are  used 
to  approximate  some  portion  of  the  problem,  usually  the  expected  future  costs  (or 
rewards),  often  called  the  “ cost-to-go”  or  “value”  function.  This  cost-to-go  function 
is  the  J*(f(x,u,w ))  expressed  in  Equation  2.16.  Whether  this  is  done  through  some 
type  of  explicit  mathematical  programming  method,  or  through  Monte  Carlo  (or 
other)  simulation,  these  techniques  are  designed  to  exploit  the  special  structure  of  the 
specific  problem  to  compute  solutions  which  are  nearly  optimal,  but  are  done  using  a 
fraction  of  the  computational  requirements.  Powell  [82]  also  provides  a  list  of  problems 
that  must  be  addressed  when  trying  to  solve  approximate  dynamic  programming 
problems  in  general: 

•  Forward  dynamic  programming  avoids  looping  over  all  possible  states,  but  still 
requires  an  explicit  understanding  of  the  one-step  transition  matrix  and  the 
possible  states  the  system  may  transition  to. 

•  The  values  obtained  at  each  current  state  are  known,  with  the  need  to  know 
the  values  of  the  states  which  may  be  visited, 

•  Certain  policies  may  cause  the  system  to  never  visit  states  which,  in  the  exact 
formulations,  would  net  good  solutions 

•  Each  problem  is  unique,  and  while  the  approximate  dynamic  programming 
strategy  is  rather  general,  it  cannot  provide  a  mechanism  for  determining  what 
will  work  best  for  the  specific  problem 

Using  these  principles,  an  overview  of  some  of  the  more  widely  used  techniques  in 
literature  is  no  provided,  with  the  goal  of  providing  a  sufficient  spread  while  maintain¬ 
ing  generality  and  conciseness.  The  focus  is  also  restricted  to  the  simulation-based 
techniques  found  in  [13]  and  [82], 
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2.2.3  Lookup  Tables  and  Q-Learning. 


In  some  cases,  a  system  may  be  so  complex  that  it  cannot  be  explicitly  modeled 
mathematically.  One  example  would  be  a  complex  simulation  which  needs  to  be 
optimized.  The  key  to  using  Q-Learning  is  that  the  behaviors  of  the  system  are  able 
to  be  observed  directly,  and  controls  are  able  to  be  placed  on  the  system  iteratively. 
Define  for  each  state  i  (notation  for  the  state  is  temporarily  changed  here  to  allow  a 
more  concise  description  of  transitions)  and  decision  u  pair  (i,u)  for  u  G  U(i),  and 
the  optimal  Q-factor  by 


Q*(i,u)  =  5>,(u)  (g(i,u,j)  +  aJ*(j)).  (2.17) 

3= 1 

For  these  problems,  instead  of  approximating  the  cost-to-go  function  for  the  se¬ 
lected  policy,  at  each  iteration  the  Q-factors  for  each  state  are  updated.  This  allows 
the  multiple  policy  evaluation  steps  of  policy  iteration  to  be  avoided.  Instead,  use 
value  iteration  on  Qk+i  =  FQk  defined  by 


(FQ)(i,u)  =  ^2pij(u) 

3=1 


g(i,u,j)+a  min  Q(j,v] 
veu(j) 


V(i,  u).  (2.18) 


Using  this  relationship,  this  is  equivalent  to  a  discounted  Bellman  equation,  and 
the  algorithm  converges  to  Q*  from  any  starting  point  Qq.  In  words,  the  Q  factors  are 
statistical  estimates  of  the  true  future  cost  (or  reward)  given  a  state  and  action,  which 
is  beneficial  because  instead  of  needing  an  explicit  transition  function  simulation 
outputs  are  used  to  iteratively  update  the  estimates. 
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2.2.4  Approximate  Value  Iteration. 


Next,  consider  an  approach  in  which  a  model  is  known,  but  the  specific  one-step 
transition  probabilities  are  unable  to  be  determined.  For  this,  Powell  [82]  suggests 
randomly  generating  a  sample  of  K  possible  outcomes  at  each  iteration  of  what  may 
occur  in  the  system  (i.e.  the  random  occurrence  Wk)  and  select  the  probability  that 
each  of  those  randomly  generated  outcomes  will  occur.  One  such  recommendation 
is  to  let  pn(wi)  =  j--  (here  the  probability  is  indexed  by  n,  denoting  the  iteration 
for  which  the  outcomes  have  been  generated).  The  expected  total  costs  are  then 
approximated  using  the  standard  recursions  of  (2.16)  using  the  generated  outcome 
space.  Next,  the  estimate  of  the  value  is  updated  using 

Jn{p^n)  (1  ^n—l)Jn—l{p^n)  T  Oin—\Vn  (2.19) 

where  vn  is  the  approximation  discussed  above.  As  will  be  seen  in  many  of  the  applica¬ 
tions  of  approximate  dynamic  programming,  the  stochastic  smoothing  equation  (2.19 
attempts  to  use  observations  of  the  inherently  noisy  data  to  approximate  the  mo¬ 
ments  of  the  actual  distribution  from  which  the  observations  are  being  drawn.  Powell 
[82]  provides  extensive  details  on  selection  of  step  size,  a ,  and  a  rigorous  discussion 
of  convergence  properties  for  many  instances. 

2.2.5  Low-Dimensional  Value  Function  Approximation. 

The  next  method  discussed  concerns  itself  with  reducing  the  dimensionality  of 
the  problem,  by  combining  them  into  aggregate  states.  The  effectiveness  for  this 
method  in  the  context  of  approximate  dynamic  programming  is  that  the  aggregated 
states  are  used  to  determine  the  cost-to-go  approximation,  and  at  each  iteration  all 
states  are  iterated  over  [82],  In  traditional  methods,  the  aggregated  states  can  also 
be  enumerated  over,  but  in  many  cases  this  leads  to  poor  estimations  of  the  problem 
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solution.  Another  way  this  method  may  help  is  by  taking  a  continuous  state  space 
and  discretizing  it  to  use  traditional  methods.  Next,  the  aggregation  framework  of 
Bertsekas  [13]  is  introduced.  Let  A  be  a  finite  set  of  aggregate  states,  and  define  a 
disaggregation  probability  dxi  such  that 

n 

J2dxi  =  1,  VxeA  (2.20) 

1=1 

where  x  is  an  aggregate  state  and  i  is  the  original  system  state.  Then,  for  each 
aggregate  state  y  and  original  system  state  j,  the  aggregation  probability  (pjy  is 

=  1,  Vj  =  1, . . . ,  n  (2.21) 

y&A 

Note  that  dxl  is  essentially  the  proportion  for  which  x  is  represented  by  i,  and 
(f)jy  is  the  “degree  of  membership  of  j  in  the  aggregate  state  y."  Define  the  matrices 
D  =  [{dxili  =  1,...  ,  n}]  and  $  =  [{4>jy\y  £  >4.}].  For  clarity,  these  elements  of  the 
sets  then  represent  the  elements  of  their  respective  matrices.  Then  an  approximation 
of  Bellman’s  equation  is  obtained  by  <f>R  where 

R  =  DT($R)  (2.22) 

Here  T  is  the  recursion  operator  defined  in  Bertsekas  [13]. 

One  good  example  of  this  which  should  be  applicable  for  this  research  is  to  consider 
a  multi-stage  weapon-target  assignment  problem  formulated  by  Hosein  [48].  At  each 
time  step,  one  example  is  to  aggregate  all  future  stages  into  a  static  weapon  target 
assignment  problem  with  the  remaining  weapons  and  targets. 
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2.2.6  Adaptive  Estimation. 


Adaptive  estimation  algorithms  are  broad  and  are  also  centered  around  the  rela¬ 
tionship  shown  in  Equation  (2.19).  The  primary  idea  is  that  we  are  trying  to  estimate 
a  value  g{x)  for  being  in  state  x,  and  g{x)  is  a  somewhat  randomized  estimate  of  g(x). 
A  stochastic  gradient  algorithm  then  provides  the  result  of  Equation  (2.19).  There  are 
many  types  of  methods  under  Adaptive  Estimation,  such  as  recursive  least  squares, 
approximate  value  iteration,  least  squares  temporal  differences,  and  least  squares  pol¬ 
icy  evaluation.  The  use  of  these  methods  is  in  determining  average  costs  of  being  in 
each  state.  For  the  purposes  of  this  dissertation,  each  method  is  introduced  with  a 
short  example  of  the  type  of  problem  they  are  applicable  to.  For  further  information, 
Powell  [82]  provides  some  insightful  explanations  of  these  giving  closed  form  deriva¬ 
tion  using  a  single  state.  Another  term  (used  by  Bertsekas)  for  Adaptive  Estimation 
algorithms  is  Approximate  Policy  Iteration. 

Recursive  Least  Squares 

Recursive  least  squares  uses  a  means  of  generating  approximations  for  the  system 
using  a  linear  combination  of  basis  functions  &f(x),  where  /  G  T  is  considered  a 
feature.  The  approximation  is  then 

•/>)  =  £/ fl/S/M  =  ®(x)T/J  (2.23) 

where  f3  are  traditional  regression  coefficients.  These  techniques  are  able  to  be  applied 
any  time  the  value  function  can  successfully  be  approximated  using  linear  regression. 
This  art  is  left  to  the  reader  for  a  specified  problem.  Specific  methods  exist  for 
cases  where  the  analyst  has  stationary  data,  non-stationary  data,  and  where  multiple 
observations  are  obtainable. 


Least  Squares  Temporal  Differences 


Identified  by  Powell  [82]  as  one  of  the  more  powerful  and  attractive  tools  in  ap¬ 
proximate  dynamic  programming,  using  temporal  differences  provides  a  means  of  up¬ 
dating  a  functional  approximation.  At  each  iteration,  estimates  of  the  least  squares 
regression  coefficients  f3  can  then  be  updated.  This  method  fixes  a  policy  and  then 
finds  the  best  fit  for  the  linear  model.  Additionally,  the  standard  transition  function 
is  used  to  determine  the  next  state  to  visit  using  this  fixed.  This  is  also  known  as 
on-policy  learning  [82],  The  reason  this  method  is  so  powerful  is  that  it  combines 
techniques  which  allow  the  user  to  obtain  regression  coefficient  estimates  and  uses 
them  in  the  traditional  approximate  dynamic  programming  solution  framework. 

Least  Squares  Policy  Evaluation 

Least  squares  policy  evaluation  uses  basis  functions  developed  for  infinite  horizon 
applications.  At  the  nth  iteration,  the  regression  coefficients  are  determined  by 


where  C)  is  a  random  variable  providing  the  ith  contribution  [to  the  value  function] 
(this  is  considered  a  one-period  contribution  at  the  ith  step  in  the  infinite  horizon). 
Again,  this  method  is  just  another  way  of  determining  the  expected  reward  gained 
by  being  in  state  x  to  help  compute  average  long-run  rewards. 

2.2.7  Issues  of  Simulation-Based  Cost  Approximation. 

As  discussed  by  Bertsekas  [13],  these  methods  primarily  concern  themselves  with 
optimizing  over  an  approximated  single  (or  multi)-step  lookahead  approach.  Deter¬ 
mining  these  approximations  is  where  the  mathematics  and  art  of  dynamic  program¬ 
ming  merge.  Getting  an  appropriate  approximation  can  take  both  time  and  effort, 
and  may  not  provide  a  robust  methodology  for  solving  problems  which  are  closely 


bn  =  arg  max 


Y,Pf®Axi)  -  0 Oi  +  lJn-i(xi+l ))  )  (2.24) 

f 
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related  to  the  original.  Another  issue  arises  with  the  statistical  testing  of  the  ap¬ 
proximations,  determining  the  rate  of  convergence,  and  solution  quality.  Each  of  the 
methods  presented  (and  others  found  in  the  literature)  have  their  benefits  and  draw¬ 
backs.  Some  may  be  the  correct  choice  for  the  problem  being  investigated,  and  others 
may  not  be  useful  at  all.  The  analyst  has  the  task  of  generating  an  appropriate  model 
(if  available)  and  determining  which  solution  technique (s)  should  be  applied. 

2.2.8  Approximate  Dynamic  Programming  for  Resource  Allocation. 

Several  articles  apply  approximate  dynamic  programming  for  various  resource  al¬ 
location  instances.  This  section  is  not  intended  to  be  a  full  literature  review  of  these 
applications.  Instead,  the  common  themes  amongst  these  papers  are  captured,  and 
the  feasibility  of  approximate  dynamic  programming  as  a  solution  technique  for  the 
WTA  problem  is  developed.  Powell  has  done  a  substantial  amount  of  applied  ap¬ 
proximate  dynamic  programming  work  in  resource  allocation  within  the  transporta¬ 
tion  industry  [80]  [81]  [84]  [82],  One  structural  factor  that  is  exploited  is  the  declin¬ 
ing  marginal  return  of  assigning  an  additional  weapon  to  any  given  single  target. 
As  such,  the  value  function  is  concave.  Godfrey  and  Powell  [37]  have  developed  a 
method  for  approximating  concave  functions  and  have  successfully  applied  it  to  a 
number  of  practical  applications  [80]  [94],  Castanon  has  also  done  work  in  approxi¬ 
mate  dynamic  programming  for  resource  control,  to  include  sensor  management  [21], 
multiplatform  path  planning  [77],  and  stochastic  scheduling  (along  with  Bertsekas) 
[14],  Another  area  which  has  a  significant  amount  of  literature  is  vehicle  routing 
with  stochastic  demands  [73]  [86]  [87]  [3].  Other  resource  allocations  applications  in¬ 
clude  activity  networks  for  project  planning  [32]  [93],  model  predictive  control  [22], 
and  high- dimensional  generalized  resource  allocation  [81],  among  others. 


30 


2.3  Summary 


This  chapter  provides  a  review  of  relevant  literature  is  presented  as  a  background 
for  the  goals  of  this  dissertation.  The  key  themes  for  this  literature  review  are 

1.  The  complexity,  diversity,  and  flexibility  of  the  WTA  problem 

2.  The  flexibility  and  applicability  of  approximate  dynamic  programming  as  a 
solution  for  resource  allocation  problems 

Given  the  literature,  there  are  gaps  which  must  be  covered  to  address  the  motivat¬ 
ing  problem.  First,  a  more  practical  formulation  for  the  DWTA  must  be  formulated 
that  considers  dynamic  weapons  capabilities.  Development  of  this  new  formulation 
requires  solution  methodologies  not  found  in  the  literature.  As  a  solution  method¬ 
ology,  approximate  dynamic  programming  is  often  used  for  large  resource  allocation 
problems.  However,  because  of  the  structure  and  complexity  of  the  WTA  problem, 
the  size  of  the  decision  space  is  often  prohibitively  large.  Therefore,  approximate 
dynamic  programming  methodologies  which  address  this  issue  are  investigated.  This 
research  developed  in  this  dissertation  specifically  addresses  each  of  these  gaps. 
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III.  Optimal  multi-stage  allocation  of  weapons  to  targets 
using  adaptive  dynamic  programming 


3.1  Abstract 

We  consider  the  optimal  allocation  of  resources  (weapons)  to  a  collection  of  tasks 
(targets)  with  the  objective  of  maximizing  the  reward  for  completing  tasks  (destroy¬ 
ing  targets).  Tasks  arrive  in  two  stages,  where  the  first  stage  tasks  are  known  and 
the  second  stage  task  arrivals  follow  a  random  distribution.  Given  the  distribution  of 
these  second  stage  task  arrivals,  simulation  and  mathematical  programming  are  used 
within  a  dynamic  programming  framework  to  determine  optimal  allocation  strategies. 
The  special  structure  of  the  assignment  problem  is  exploited  to  recursively  update 
functional  approximations  representing  future  rewards  using  subgradient  information. 
Through  several  theorems,  optimality  of  the  algorithm  is  proven  for  a  two-stage  Dy¬ 
namic  Weapon-Target  Assignment  Problem. 

3.2  Introduction 

The  weapon-target  assignment  (WTA)  problem  is  a  model  of  combat  operations 
where  we  maximize  the  total  expected  damage  caused  to  the  enemy’s  targets  (or 
minimize  the  value  of  leaker  missiles)  using  a  finite  number  of  weapons.  Optimally 
assigning  interceptors  to  targets  is  a  subject  that  has  become  increasingly  important 
with  the  proliferation  of  intercontinental  ballistic  missiles  (ICBMs).  The  WTA  prob¬ 
lem  is  known  to  be  NP-complete  [60].  In  general,  two  cases  of  the  WTA  problem 
are  considered,  static  and  dynamic.  The  static  case  allocates  m  weapons  to  n  targets 
at  one  time  after  all  problem  information  is  known.  The  dynamic  case  provides  an 
allocation  policy  over  some  time  horizon,  for  which  more  information  may  arrive  as 
time  progresses.  Generally,  both  formulations  contain  at  least  stochastic  single  shot 
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kill  probabilities  for  weapon-target  pairs,  and  many  include  additional  uncertainties. 
One  example  of  a  dynamic  problem  is  as  follows.  Suppose  there  are  two  waves  of 
incoming  ICBMs  where  the  number  of  targets  (ICBMs),  n,  and  their  values,  V3  ,  in 
the  first  wave  is  known  and  the  second  wave  is  known  only  up  to  a  probability  distri¬ 
bution.  ff  the  single  shot  probability  of  the  weapon  (interceptor)  successfully  hitting 
a  target  is  p ,  and  each  shot’s  outcome  is  independent  of  the  outcome  of  any  other 
shot,  then  the  decision  space  for  a  fixed  number  of  interceptors  consists  of  how  many 
interceptors  to  allocate  to  the  first  wave  verses  the  number  of  inceptors  allocated  for 
assignment  to  the  second  wave.  This  formulation  is  attributed  to  Murphey  [71]  who 
proposes  a  stochastic  decomposition  approximation  technique.  This  chapter  provides 
an  optimal  solution  for  the  formulation  of  [71]  by  exploiting  the  special  structure  of 
the  problem. 

3.3  Literature  Review 

3.3.1  Static  Weapon- Target  Assignment. 

The  SWTA  is  formulated  as  follows.  Let  V3  denote  the  value  of  the  jth  target,  Wi 
denote  the  number  of  available  weapons  of  type  i.  We  assume  we  have  m  weapon 
types  and  n  targets.  Let  pl3  be  the  single  shot  probability  that  a  weapon  of  type  i  will 
kill  a  target  of  type  j,  such  that  the  single  shot  probability  of  survival  is  q%3  =  1  —  pl3 . 
Our  decision  variable  Xij  is  the  number  of  weapons  of  type  i  assigned  to  target  j.  The 
defensive  SWTA  problem  is  then  formulated  as  a  nonlinear  integer  program: 

n  m 

min  J2Vj(H%ij)  (3-1) 

j= 1  i=  1 

subject  to 
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Xij  <  Wi  for  all  i  —  1,  2, . . . ,  m,  (3.2) 

3= 1 

>  0  and  integer,  for  all  i  =  1,2, ...  ,m,j  =  1,2, ...  ,n.  (3.3) 

Much  of  the  WTA  literature  has  been  dedicated  to  the  SWTA  problem  formula¬ 
tion  which  was  shown  to  be  NP-complete  in  1986  by  Lloyd  and  Witsenhausen  [60]. 
As  such,  much  research  has  been  done  in  the  past  several  decades  to  determine  ef¬ 
ficient  methods  of  identifying  optimal  solutions.  Computationally  efficient  optimal 
methods  exist  for  two  cases  of  the  SWTA  under  simplifying  assumptions.  First,  given 
a  homogeneous  weapon  set,  ptJ  =  pj  for  all  i,  denBroeder  [30]  shows  optimality  is 
achieved  by  evenly  distributing  the  weapons  across  as  many  targets  as  possible  using 
the  maximum  marginal  return  (MMR)  algorithm.  The  second  instance  assumes  that 
each  target  can  have  at  most  one  weapon  assigned  to  it  [24]  [75] .  Because  our  problem 
focuses  on  a  special  instance  of  the  dynamic  WTA  (DWTA)  problem,  we  focus  our 
literature  review  there. 

3.3.2  Dynamic  Weapon- Target  Assignment. 

Though  it  has  not  been  researched  to  the  extent  of  the  SWTA  problem,  the 
DWTA  problem  provides  a  more  practical  implementation  by  considering  the  impact 
current  decisions  have  on  future  states.  However,  by  breaking  the  problem  up  into 
several  decision  epochs,  the  DWTA  is  a  much  more  complex  problem.  Similar  to 
the  SWTA,  numerous  methods  have  been  employed  to  provide  solutions  for  various 
types  of  DWTA  problems.  As  the  originator  of  the  dynamic  instance,  Hosein  [47] 
provides  several  results  which  are  generalizable  to  the  DWTA  problem.  Additionally, 
Castanon  [20]  and  others  at  ALPHA  TECH  were  developing  advanced  algorithms 
for  the  DWTA  in  parallel.  Murphey  [70]  [71]  uses  stochastic  decomposition  for  the 
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two-stage  problem  defined  in  Sect.  3.4.  Chang  [24]  uses  a  static  WTA  approximation 
scheme  within  an  iterative  linear  network  flow  framework  to  efficiently  provide  high- 
quality  solutions  for  the  DWTA.  Because  of  the  integer  restriction  for  the  decision 
variables,  the  chromosome  representation  within  a  GA  presents  a  useful  scheme  for 
solving  both  the  static  and  dynamic  versions  of  the  WTA  problem.  As  such,  much 
work  has  developed  hybrid  GAs  to  assist  in  solving  the  DWTA.  Wu  et  al.  [99]  apply  a 
modified  GA  to  the  DWTA  and  introduces  weapon  use  deadlines  within  the  problem 
formulation.  Xin  et  al.  [101]  develop  a  heuristic  which  uses  problem  information 
(domain  knowledge)  and  constraint  programming  to  assign  priorities  to  assignments. 
Evolutionary  heuristics  which  use  a  hybridized  GA  with  memetic  algorithms  have 
also  been  applied  to  the  DWTA  [25].  Additionally,  Khosla  [54]  applies  a  hybrid 
heuristic  which  uses  a  simulated  annealing  (SA)  type  heuristic  to  determine  the  fitness 
of  a  population  within  a  GA  framework.  Other  heuristic  techniques  applied  to  the 
DWTA  include  Tabu  Search  [102],  AGO  with  tabu  table  updates  [103],  and  a  modified 
Hungarian  method  with  PSO  [56]  (though  this  is  in  an  open  source  text,  so  it’s  rigor 
may  be  unverified).  Lastly,  exact  dynamic  programming  [89]  [91]  has  also  been  applied 
to  the  DWTA. 

3.4  Problem  Formulation 

As  a  simplification  to  the  multi-stage  problem  posed  by  Hosein  [48],  Murphey  [69] 
defined  a  two-stage  stochastic  programming  model  of  the  DWTA  problem.  In  this 
model,  we  consider  the  probability  that  an  adversary  has  a  total  stockpile  of  weapons 
and  shoots  a  portion  of  them  in  the  first  stage,  with  the  remainder  of  the  weapons 
known  to  a  probability  distribution.  Let  ri\  targets  arrive  in  stage  1  with  certainty, 
and  n-2  targets  arrive  in  stage  2  according  to  a  known  distribution.  Let  the  random 
vector  w  G  O  denote  the  number  of  second  stage  target  arrivals  where  D  is  the  set  of 
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all  possible  arrivals.  Suppose  the  probabilities  of  survival,  qJ}  and  target  values,  V3, 
for  each  target  j  are  given.  Then  the  2-stage  WTA  programming  formulation  is: 

ni 

Zi(x )  =  max^  y/(l  -  {q])xi  )  +  Ewen[Z2(x(2),  ud)j  (3.4) 

3= 1 

subject  to 

ni 

E4"  <v. 

3= 1 

<  b 

xf  eZ+,j  =  i...,JV 

Kuj£fl[Z2(x('2') ,  ud)]  is  the  expected  second  stage  value  and,  given  a  number  of  sec¬ 
ond  stage  weapons,  x^2\  and  a  sample  realization  of  targets,  ud,  is  piecewise  inte¬ 
ger  concave  (for  a  proof  of  this,  see  [2])  and  is  solved  using  the  MMR  algorithm. 

x'3  '>  <  M  is  the  resource  capacity  constraint,  and  b  is  the  vector  denoting  the 
maximum  weapons  that  can  be  assigned  to  any  one  target. 

Z2(x(2\(jJj)  is  the  solution  to  the  second  stage  problem  and  is  expressed  as: 

n2  (cj) 

Z2(x(2\  ud)  =  max  ^  V?(w)(  1  -  qf( u)xj  ')  (3.5) 

3= 1 

subject  to 


ni 

n2  (ud ) 

+  E  h2) 

—  M, 

3= 1 

3= 1 

a;*-2) 

<  b 

r(2) 

xj 

e  z+ 
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3.5  Theoretical  Results 


In  this  section  we  discuss  the  methodology  used  to  solve  the  above  two-stage 
DWTA  problem  presented  above  and  present  the  theoretical  results  for  our  solution. 
Instead  of  using  the  cutting  plane  approach  of  Murphey  [70] ,  we  formulate  the  problem 
as  a  dynamic  programming  problem  to  develop  a  solution  algorithm  with  the  help  of 
the  post-decision  dynamic  programming  formulation  and  the  concave  adaptive  value 
estimation  (CAVE)  functional  approximation  algorithm  developed  by  Godfrey  and 
Powell  [39], 

3.5.1  Adaptive  Dynamic  Programming. 

Consider  a  general  finite  space  and  discrete  time  horizon  dynamic  programming 
problem.  Let  S  be  the  state  space  of  the  system  with  time  horizon  t  =  0, . . . ,  T. 
The  state  St  e  S  represents  the  state  of  the  system  at  time  t,  and  a  decision  xt  that 
acts  on  the  system  is  selected  from  a  finite  set  U  at  each  time  step.  Wt  is  a  random 
occurrence  generated  with  a  known  probability  distribution  and  the  system  evolves 
according  to  a  transition  function  which  has  the  form 

St+i  =  fi(St,xt,Wt)  (3.6) 

where  /i(-)  is  a  function  describing  the  system  dynamics.  Next,  define  the  one-period 
contribution  for  being  in  state  St  and  making  decision  xt  as  Ct(St,xt)  and  express 
the  T-stage  value  to  be  maximized  as  the  expected  value  of  the  summation  of  the  T 
costs: 


max  E 
xt€U{St) 


Y,ct{St,xt)\SQ 


t= 0 


(3.7) 
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It  is  well  known  that  problems  of  the  form  given  in  (3.7)  can  be  solved  by  Bellman’s 
optimality  equations  [12]: 

Jt(St)  =  max(Ct(St,xt)  +  EWt  {Jt+i(St+1(St,xt,Wt))\St}  (3.8) 

Xt 

Problems  of  this  type  grow  exponentially  within  the  state,  decision,  and  outcome 
spaces  -  known  as  the  curses  of  dimensionality.  Therefore  it  is  necessary  to  approx¬ 
imate  the  value  function  Adaptive  Dynamic  Programming  provides  a 

means  for  stepping  forward  through  time  iteratively  using  sample  realizations  of  our 
approximated  value  function. 

3.5.2  Two-Stage  DWTA  ADP  Solution. 

Our  method  uses  Monte  Carlo  sampling  of  second-stage  target  arrivals  to  approx¬ 
imate  our  value  function.  By  making  use  of  the  concavity  of  the  stage  2  function,  we 
have  developed  an  algorithm  which  optimally  determines  the  number  of  interceptors 

needed  in  the  second  stage.  Given  a  fixed  number  of  weapons  and  a  sample  realization 

x(a) 

of  stage  2  targets,  n2 ,  it  is  clear  that  /(n2)  =  maxa,(2)  V)(l  —  )  is  a  piecewise 

integer  concave  function.  Here,  x)  ;  denotes  the  number  of  weapons  allocated  to  the 
jth  target  in  the  second  stage. 

Using  the  post-decision  state  dynamic  programming  notation  of  Powell  [83],  if  we 
assume  a  piecewise  linear  concave  approximation,  then  our  second  stage  post  decision 
value  function  becomes  Jf(Sf)  =  Ewen[Z2(x(2), ud)].  Our  post-decision  state  is  then 
Sf  =  n2,  and  for  any  given  number  of  weapons,  the  slopes  of  our  function  represent 
the  marginal  value  of  adding  one  more  weapon  to  the  second  stage.  As  such,  we 
modify  the  MMR  algorithm  of  denBrodeur  [30]  while  maintaining  optimality  for  the 
special  case  of  the  DWTA. 

Algorithm:  MMR  Plus 
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Step  0:  Given  Jf(S'f), 


Initialize  Xj  —  0  Vj  =  1, . . . ,  N  and  set  x^+i  =  x^ 

Set  Sj  —  Vj  lor  j  -  1 . . .  N. 

Compute  the  marginal  returns  M Rj  =  Sj(  1  —  qj),  MRN+i  =  Jf  (1)  —  J*(0)  Vj. 
Initialize  weapon  index  i  =  1. 

While  i  <  M,  do 

Step  1:  Find  target  k  for  which  weapon  i  has  the  greatest  effect  Compute 

k  =  argmaxJ=lr. .  jv+i  MRj 

Step  2:  Increment  the  allocation  to  target  k:  Xk  Xk  +  1. 

If  j  <  N,  update  the  expected  surviving  value  =  Skqk,  then  update 
the  marginal  return  MRk  =  Sk(  1  —  qk), 
else  increment  Xat+i  xat+i+I  and  update  the  marginal  return  MR^+i  = 
{xn+i  +  1)  _  J\(xn+ i) 
set  i  —  i  +  1  and  continue 

We  now  prove  the  existence  of  a  piecewise  linear  concave  function  and  the  opti¬ 
mality  of  the  MMR  Pins  Algorithm. 

Theorem  3.5.1  If  Jf(S'f  =  n2)  =  E [Z(x^2\uj)],  the  MMRPlus  algorithm  is  optimal. 

Proof:  Given  any  scenario  in  stage  2,  by  the  MMR  algorithm  for  the  WTA  problem, 
the  solution  is  monotonic  increasing  and  integer  concave  [30].  We  represent  the  slopes 
of  each  stage  2  function  for  the  n  scenarios  as 
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er >  &>%>...  (3.9) 

where  denotes  the  marginal  reward  gained  by  saving  the  ith  weapon  for  the  second 
stage,  given  the  jth  target  arrival  scenario.  Let  pj,  j  =  1, . . . ,  n  be  the  probability  of 
scenario  j.  Then,  since  pj  >  0,  Vj,  the  inequalities  remain  valid  by  multiplying  the 
inequalities  by  their  respective  probabilities 


Mi1  >  Ihil  >  PiQ  >  ■  ■  • 

Png>  Pn®  >  PnQ  >••  •  (3.10) 

Through  term  by  term  addition,  the  inequalities  hold  to  obtain 

n  n  n 

Eftd  >  -■■■ 

3= 1  i=1  J=1 

Therefore,  K[Z(x^2\uj)\  is  monotone  increasing  and  integer  concave  in  x^2\ 

The  resulting  optimization  problem  at  stage  1  is  an  integer  optimization  with  a 
monotropic  value  function  for  which  each  separable  function  is  monotonic  increasing 
and  piecewise  integer  concave.  Since  there  is  a  single  linear  constraint  coupling  the 
allocations  to  all  targets,  strong  duality  guarantees  the  existence  of  a  scalar  dual  vari¬ 
able  A  such  that,  at  the  optimal  allocations  x*,j  =  1, . . . ,  N  + 1,  the  right  derivatives 
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of  each  separable  function  are  less  than  or  equal  to  A,  and  the  left  derivatives  are 
greater  than  or  equal  to  A,  and  xj  =  M ■ 

Since  all  the  separable  functions  are  piecewise  integer  concave,  each  function  has 
a  finite  number  of  slopes  (derivative  values).  The  MMRPlus  algorithm  searches  over 
the  possible  slopes  of  all  the  separable  functions  in  decreasing  order,  modifying  the 
allocations  Xj  appropriately  until  Xj  =  M. 

Since  each  function  for  each  target  and  the  function  for  the  second  stage  expected 
value  is  monotonic  increasing,  by  the  property  of  the  MMRPlus  selecting  the  function 
with  the  greatest  increase  in  objective  value  the  MMRPlus  algorithm  is  optimal.  □ 

So  that  we  do  not  have  to  compute  a  piecewise  linear  approximation  Jf(Sf)  for 
every  a:*'1-’  by  simulation  for  every  u,  our  approach  only  focuses  around  the  optimal 
value  of  x^\  To  find  the  piecewise-linear  approximation  Jf(Sf)  around  the  optimal 
a;*-1),  we  use  a  version  of  the  CAVE  algorithm  developed  by  Godfrey  and  Powell  while 
preserving  concavity  [37].  Given  a  realization  of  second  stage  target  arrivals  u  and  the 
current  solution  of  the  MMRPlus  algorithm  where  x ^  =  M  —  x W  =  Xn+i  weapons 
are  allocated  to  the  second  stage,  the  left  and  right  derivatives,  v~(lo)  and  u+(u;) 
respectively,  are  calculated  as: 


v  (x^2\u)  = 
V+(x('2\u)  = 


Z(x(-2\cj)  —  Z(x('2')  —  l,u;),  if  x ^  ^  0 

0  otherwise 

Z(x ^  +  l,o;)  —  Z{x^2\u) 


(3.12) 

(3.13) 


where  Z{x^2\uj)  is  the  solution  by  the  MMR  algorithm  of  the  second  stage  problem 
given  aA2)  weapons  and  sample  realization  u. 


41 


Of  course  these  are  the  left  and  right  derivatives  for  only  one  sample  realization 
of  the  problem  and  for  only  one  particular  state  which  is  sufficiently  captured  in  the 
number  of  weapons  passed  to  the  second  stage, .  The  piecewise  linear  approxima¬ 
tion  of  (x^2\  cu)}  is  defined  by  a  finite  set  of  ordered  breakpoints,  {(vk,u')\k  e  /C}, 
where  fC  =  {0, 1, . . . ,  M}.  Each  breakpoint  defines  a  linear  segment  with  vk  as  the 
slope  of  the  segment  projected  from  uk  where  a  breakpoint  is  defined  at  each  positive 
integer  up  to,  and  including,  M  —  1.  Concavity  implies  that  the  slopes  are  nonincreas¬ 
ing,  as  v°  >  v1  >  . . .  >  vM~l.  By  Theorem  3.5.1,  P{y~{x^2\  u)  >  v+{x^2\lv))  =  1, 
Vad2)  >  0  since  the  slopes  are  always  monotone  decreasing  and  positive  for  all  real¬ 
izations  of  targets  in  stage  2.  From  the  solution  of  the  subproblems  by  the  MMR 
algorithm,  this  can  easily  be  proven  for  the  2-stage  DWTA. 

The  left  subgradient  v~(x^2\u)  is  smoothed  into  the  approximation  slopes  to  the 
left  of  x^  to  some  minimal  extent  determined  by  the  interval  /  =  [max(0,  min(x^  — 
e~ ,  uk  )),  min(max(s  +  e+,  uk++1),  M)].  The  same  idea  is  applied  to  the  right  of  x ^ 
for  the  right  subgradient.  Concavity  is  preserved  by  the  following  theorem,  similar  to 
the  one  found  in  [37]: 

Theorem  3.5.2  Consider  a  concave  approximation  defined  by  breakpoints  {[yk ,  uk)\k  e 
K}  where  uk  are  the  integers  in  {0, . . . ,  M  —  1}.  Using  the  CAVE  algorithm  described 
above  to  obtain  I  =  [um,un]  with  post-decision  state,  x^2\  Concavity  is  preserved 
under  the  smoothing  operation  where  0  <  a  <  1. 

Proof:  Case  I:  If  vrn  =  u"  no  update  takes  place  and  concavity  of  the  original  function 
is  preserved. 

Case  II :  /  7^  0,  then  we  use  the  updates 

"new  =  av~(x{2\u)  +  (1  -  a)v^d  for  k  =  m, . . .  ,x{2)  -  1  (3.14) 
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and 


V new  =  «^+(^(2V)  +  (1  -  oc)vkoli  for  k  =  x{2\  . . .  ,n  -  1  (3.15) 

The  slopes  of  the  original  function  decrease  monotonically  in  k. 


um 

> 

zV(2) 

.  3,(2) 

>  V 

+1  >  ' 

•  •  >  un~l 

(3.16) 

(1- 

-  a)um 

> 

■  ■■> 

(1- 

a)isx(2) 

>(1 

-  a)ux(2)+1  > 

■■■  >  (1  -a)vn~l 

(3.17) 

lo 

+ 

1 

-  a)um 

> 

■  ■■> 

av~[ 

l  +  (l 

-  a)vx(2)  > 

av+(x('2\  u)  +  (1 

—  a)v 

av+(x^2\u)  +  (1  — 

3 

(3.18) 

Equation  3.16  holds  by  the  concavity  of  the  original  function.  Equation  3.17  holds 
by  multiplication  of  a  positive  constant.  Equation  3.18  holds  since  v+(x^2\u)  < 
v~{x^2\u).  Therefore,  the  resulting  function  is  also  concave.  □ 

3.5.3  The  Adaptive  DWTA  Algorithm. 

Having  explained  the  components  of  the  algorithm,  the  MMR,  MMRplus  and 
CAVE  algorithms  are  combined  to  form  the  solution  algorithm.  We  let  QApprox 
represent  the  current  approximation  and  use  the  following  algorithm  to  obtain  our 
approximate  dynamic  programming  solution: 

Step  1  Initialization 

•  j  =  o 

•  Set  z/  =  0,  Vi  =  0, . . . ,  M  —  1 

•  Set  ul  —  i,  Vi  =  0, _ ,  M  —  1 

•  e“  =  2,  e+  =  2 
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Step  2  Forward  Simulation 


•  Solve  current  problem  with  current  QApprox  using  MMRplus 

•  Generate  second  stage  target  random  sample,  u  G 

•  If  j  >  20  then  =  1,  e+  =  1 

•  a  =  1/(1  +j) 

Step  3  Value  Function  Update 

•  Determine  the  left  and  right  derivative,  v~{x^2\u)  and  v+(x^2\lj),  respec¬ 
tively  using  MMR. 

•  Update  QApprox  using  the  CAVE  algorithm. 

•  If  no  change  in  10  iterations  STOP,  else  j=j+l  and  return  to  Step  2. 

We  initially  set  e~  =  e+  =  2  to  allow  the  update  of  the  piecewise  linear  concave 
approximation  of  the  value  function  to  affect,  at  a  minimum,  an  interval  of  size 
four.  Each  possible  integer  value  for  the  approximation  is  not  sampled  infinitely 
often,  so  e~  and  e+  allows  the  stochastic  subgradients  to  be  averaged  over  a  greater 
interval.  The  piecewise  linear  concave  approximation  of  the  value  function  is  only 
repetitively  updated  in  the  neighborhood  of  the  optimal  integer  value  for  the  second 
stage  decision.  Therefore,  the  shape  of  the  approximate  value  function  for  integer 
values  far  from  the  optimal  integer  value  may  be  underestimates  or  overestimates  of 
the  true  slope.  However,  because  the  function  is  concave,  the  critical  region  around 
the  optimal  integer  value  is  the  most  sampled  and  accurate.  The  accuracy  around 
the  optimal  integer  value  in  the  piecewise  linear  concave  approximation  of  the  value 
function  is  all  that  is  needed  to  provide  quality  solutions. 

After  20  updates  we  set  e~  =  e+  =  1,  allowing  the  minimum  updates  to  occur  only 
to  the  left  and  right  slope  of  the  piecewise  linear  concave  approximation  of  the  value 
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function.  The  size  of  the  initial  update  interval  and  the  rate  at  which  the  minimum 
interval  is  allowed  to  decrease  is  problem  dependent. 

Theorem  3.5.3  Assume  the  optimal  solution  is  (xi,X2),  and  that  the  subgradients 
D+  and  D~  /orE[Z(x2,£)]  are  known.  Then,  if  D+( Jf  (x2))  =  D+  and  D~  (Jf(x2))  = 
D~ ,  the  MMRPlus  algorithm  will  generate  the  optimal  solution  (x\,x2). 

Proof:  As  shown  in  Theorem  3.5.1  the  function  for  E[(3(x2,£)]  is  monotone  increas¬ 
ing  integer  concave.  Assume  that  the  MMRPlus  algorithm  generates  the  solution 
x\, ,  xn+i  which  is  not  optimal.  Because  the  approximation  is  integer  concave  and 
increasing,  if  the  stopping  slope  obtained  by  MMRPlus,  A  is  in  the  interval  [ D~ ,  D+], 
the  MMRPlus  algorithm  will  obtain  an  optimal  solution,  because  all  the  marginal 
returns  obtained  for  targets  j  =  1, . . . ,  N  will  be  computed  exactly.  Hence,  there  are 
two  possible  cases  where  the  optimal  solution  is  not  achieved: 

Case  I:  A  >  D+ .  In  this  case,  this  requires  that  the  MMRPlus  algorithm  find  M 
slopes  with  values  greater  than  D+.  However,  at  the  optimal  solution,  there  are  at 
most  M  —  n2  slopes  associated  with  targets  j  =  1, . . . ,  N  that  are  greater  than  D+. 
Since  the  approximation  J  is  integer  concave,  there  are  only  n2  —  2  slopes  greater 
than  D+,  the  slope  associated  with  the  left  derivative  at  n2.  This  contradicts  the 
statement  that  MMRPlus  found  M  slopes  with  values  greater  than  D+ . 

Case  II.  A  <  D~ .  In  this  case,  the  MMRPlus  algorithm  found  less  than  M  slopes 
with  values  greater  than  or  equal  to  D— .  At  the  optimal  solution,  MMRPlus  has 
M  —  n2  slopes  for  j  =  1, . . . ,  N  that  are  greater  than  or  equal  to  D~  which  is  greater 
than  A.  Furthermore,  the  integer  concavity  of  J2  indicates  that  there  are  n2  +  1 
slopes  greater  than  or  equal  to  A  for  j  =  N  +  1.  This  contradicts  the  property  that 
MMRPlus  makes  assignments  in  order  of  decreasing  slopes,  and  stops  after  making 
M  assignments.  □ 
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Corollary  3.5.4  The  result  in  Theorem  3.5.3  can  he  relaxed  to  provide  error  bounds 
for  convergence  for  D+  —  -D+(Tf(n2))  and  D~  —  D~ (Jf  (n2)) ,  so  that  the  critical  slopes 
only  have  to  be  accurate  to  a  threshold  e  >  0. 

Proof:  Assume  that  A  is  the  optimal  solution  of  the  SWTA  problem  using  the 
full  dynamic  programming  valne-to-go  function  and  n2  is  the  optimal  allo¬ 

cation  to  stage  2.  As  long  as  the  approximate  value  to  go  J  has  the  property  that 
A  G  [D~{Jf  (n2),  D+(Jf  (ra2)],  the  MMRPlus  algorithm  will  find  the  optimal  solution. 
Thus,  Z)+(Jf(n2))  >  A  >  D~  and  D_(Jf(n2))  <  A  <  D+.D 

3.6  Computational  Results  and  Conclusions 

The  Adaptive  DWTA  Algorithm  works  well  for  deterministic  problems  where  all 
second  stage  targets  arrive  with  probability  1.  Then  onr  algorithm  is  the  equivalent 
of  the  MMR  algorithm  and  yields  the  optimal  solution.  In  this  situation  we  are 
simply  dividing  a  weapon  target  assignment  problem  between  two  stages  and  since 
the  gradient  of  the  first  stage  function  is  known  to  be  piecewise  linear  concave  [30], 
onr  piecewise  linear  concave  function  approximation  is  exact. 

The  first  example  problem  has  8  targets  in  the  first  stage  and  np  to  8  targets  in 
the  second  stage,  each  with  identical  values,  V3  =  200.  There  are  12  total  weapons. 
We  assume  that  the  single  shot  probabilities  of  survival  qj  are  identical  and  set  to 
0.5.  We  assume  that  each  of  the  8  second  stage  targets  has  an  actual  probability  of 
arrival  of  0.5,  where  the  arrival  events  of  different  targets  are  independent.  Hence 
this  leads  to  28  =  256  possible  arrival  events  (scenarios)  at  the  second  stage.  For  this 
symmetric  problem  the  optimal  dynamic  programming  solution  yields  an  optimal 
strategy  that  uses  8  weapons  in  the  first  stage  and  4  weapons  in  the  second  stage. 
Then  we  conjecture  that  since  the  number  of  first  stage  targets  is  equal  to  the  number 
of  second  stage  targets  with  second  stage  targets  having  a  probability  of  arrival  of 
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0.5,  then  the  number  of  weapons  assigned  to  the  first  stage  is  twice  that  assigned 
to  the  second  stage  in  the  optimal  allocation  of  weapons.  Our  algorithm  obtains 
this  result.  The  result  of  our  algorithm  is  that  8  weapons  are  assigned  to  the  first 
stage  and  4  weapons  are  assigned  to  the  second  stage  with  a  vector  of  slopes  for  the 
approximation  function  given  as 

(100.0, 100.0,  99.4,  80.4,  63.8,  63.8, 45.5,  0.0,  0.0,  0.0,  0.0,  0.0) 

The  marginal  value  of  the  first  weapon  assigned  to  a  target  in  the  first  stage  is 
100,  and  the  second  weapon  is  50.  The  8  first  stage  targets  all  attribute  a  value 
of  100  to  the  objective  value  for  the  one  weapon  assigned  to  each  target.  The  sec¬ 
ond  stage  approximate  value  function  shows  that  the  average  net  value  of  one  more 
weapon,  beyond  the  4  already  assigned,  to  the  second  stage  is  63.8.  Therefore,  if  an 
additional  weapon  became  available  it  should  be  assigned  to  the  second  stage.  The 
solution  converged  to  the  optimal  assignment  after  5  iterations  which  is  significant 
less  computation  than  explicitly  determining  the  second  stage  required  for  using  the 
28  events  to  calculate  the  exact  second  stage  function  E[Q(n2,uj)\  . 

The  example  was  run  again  with  13  weapons  for  our  second  example.  The  allo¬ 
cation  of  the  weapons  was  8  to  the  first  stage  and  5  to  the  second  stage  as  expected 
with  a  vector  of  slopes  for  the  approximation  function  given  as 

(100.0,  97.9,  97.9,  93.7,  64.3,  53.2,  53.1,  34.8, 0.0,  0.0,  0.0,  0.0,  0.0) 

The  vector  is  one  component  larger  for  the  13th  weapon.  The  values  around  the 
5th  component  should  be  close  to  the  previous  vector  since  we  would  expect  this 
subgradient  to  be  sampled  relatively  often.  The  other  components  are  not  the  same 
because  they  are  not  in  the  critical  region  and  are  sampled  rarely.  It  is  easily  seen 
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that  the  last  five  components  of  the  vector  should  not  be  0.0  but  are  never  sampled 
in  the  construction  of  the  approximate  value  function. 

We  then  computed  the  close  form  expression  for  the  recourse  function  and  compare 
our  experimental  results  with  the  exact  slopes.  The  first  six  analytical  slopes  are 

(99.61,  98.05, 91.80, 80.47, 63.87,  52.83) 

The  critical  slopes  are  the  4th  and  5th  slopes  for  the  first  example  and  the  5th  and 
6th  slopes  for  the  second  example  since  this  is  the  critical  area  of  the  approximate 
function  where  the  majority  of  updates  occur.  After  5000  iterations  of  the  algorithm 
the  approximated  gradients  are  within  .07  of  both  critical  slopes  for  the  first  example 
and  .43  and  .37  of  the  5th  and  6th  slopes,  respectively,  for  the  second  example. 
Fortunately,  5000  iterations  are  not  required  as  seen  by  obtaining  the  answer  in  6 
iterations  of  our  algorithm,  the  slopes  must  only  be  within  a  threshold  value  of  the 
optimal  slopes  as  shown  in  Corollary  3.5.4. 

For  our  third  example  we  look  at  the  same  problem  as  described  in  our  first  and 
second  example  except  that  the  probability  arrivals  are  a  realization  of  a  uniform 
distribution  U (0, 1)  for  each  second  stage  target  and  50  weapons  are  assigned.  The 
probabilities  used  for  this  example  are 

0.480488,  0.888801,  0.275961,  0.840961, 0.768530,  0.719374,  0.825271,  0.123142 

The  analytic  slopes  are 

(  99.99,  99.90,  99.02,  94.69,  82.38,  65.09,  50.76, 47.40, 41.08, 40.00, 

31.30,  31.30,  21.79,  21.79,  20.10, 15.51, 13.87, 13.87, 10.75, 10.75, . . . ) 
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The  slopes  found  by  the  Adaptive  DWTA  Algorithm  are 


(  100.0, 100.0, 100.0,  50.0,  50.0,  50.0,  29.16,  29.16,  29.16,  24.66,  24.66, 

24.66, 23.88, 23.78,  23.78,  22.83,  22.81, 13.87, 10.71, 10.71, . . . ) 

The  optimal  solution  was  (xi,x2)  =  (32, 18)  and  was  converged  to  after  8  iterations. 
Looking  at  the  slope  for  X2  =  18  we  see  that  the  Adaptive  DWTA  Algorithm  con¬ 
verged  to  the  true  analytic  slope  after  5000  iterations  of  the  algorithm  taking  .218 
seconds  to  converge. 

As  a  fourth  and  final  example,  we  look  at  larger  example  of  100  first  stage  tar¬ 
gets,  100  second  stage  targets  whose  values  are  each  different.  The  Adaptive  DWTA 
Algorithm  required  1.312  seconds  for  5000  iterations  and  converged  to  the  optimal 
answer  in  1886  iterations.  This  is  compared  to  the  2100  =  1.26  7  65  x  1030  scenarios 
that  exist  for  calculation  of  the  analytical  solution  of  K[Q(x^2\  a;)]. 

As  demonstrated  above,  our  initial  results  are  favorable.  Favorable  results  are 
not  surprising  since  we  know  by  Theorem  3.5.1  that  E[<3(o;(2\  ca)]  is  concave  in  ad2b 
Therefore  a  piecewise  linear  concave  approximation  should  be  very  descriptive  if  each 
slope  for  each  integer  value  is  sampled  infinitely  often  which  is  NOT  the  case  since  we 
limit  the  number  of  iterations  to  5000  for  each  experiment  and  the  slope  is  repetitively 
sampled  at  the  critical  value  of  x^2\  We  have  shown,  however,  that  the  slopes  of 
the  approximation  are  very  close  to  the  slopes  of  E[Q(a;(2),  ca)]  around  the  optimal 
solution.  The  slopes  obtained  by  the  Adaptive  DWTA  Algorithm  have  been  proven 
sufficient  through  corollary  3.5.4  and  shown  through  experimentation  to  obtain  the 
optimal  solutions. 

The  computational  savings  obtained  by  using  the  Adaptive  DWTA  Algorithm  are 
illustrated  by  the  fact  that  for  the  fourth  example  the  solution  was  obtained  in  1.312 
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seconds  rather  than  calculating  the  analytical  solution  using  1.26765  x  1030  scenarios. 
The  Adaptive  SWTA  Algorithm  is  shown  to  be  a  fast  optimal  approach. 

This  chapter  develops  a  solution  algorithm  for  a  two-stage  dynamic  weapon-target 
assignment  problem  and  proves  solution  optimality.  Future  work  will  relax  weapon 
homogeneity  assumptions,  and  investigate  the  impact  of  cost  constraints  and  defense 
system  sensor  capability  on  solution  quality. 
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IV.  Adaptive  Dynamic  Programming  for  a  Two-Stage 
Dynamic  Weapon- Target  Assignment  Problem 


4.1  Abstract 

This  research  investigates  the  optimal  allocation  of  weapons  to  a  collection  of 
targets  over  a  two-stage  time  horizon  with  battle  damage  assessment  which  is  more 
widely  known  as  the  shoot-look-shoot  problem.  A  single  wave  of  targets  arrives  in 
stage  one  and  resources  are  allocated  with  the  intent  of  maximizing  the  value  of  de¬ 
stroyed  targets.  The  result  of  the  first  stage  allocations  is  realized,  and  the  value 
of  destroyed  targets  is  determined.  The  remaining  resources  are  allocated  to  any 
remaining  targets,  results  realized,  and  the  additional  value  of  targets  destroyed  is 
determined.  Though  the  shoot-look-shoot  problem  is  more  often  approached  as  a 
queueing  problem  where  targets  continually  arrive,  this  chapter  provides  a  two  stage 
stochastic  formulation.  This  research  investigates  allocation  of  stage  dependent  re¬ 
sources  to  non- homogeneous  targets.  An  adaptive  dynamic  programming  algorithm 
is  developed  which  provides  high-quality  solutions  in  a  fraction  of  the  time  necessary 
to  compute  an  optimal  solution  and  is  scalable  to  large  problems.  The  special  struc¬ 
ture  of  the  assignment  problem  is  exploited  and  subgradient  information  is  used  to 
update  a  functional  approximation  of  future  rewards. 

4.2  Introduction 

The  subject  of  this  chapter  is  an  effective  method  for  generating  high-quality 
solutions  to  a  two-stage  weapon  target  assignment  problem.  The  objective  is  to  max¬ 
imize  the  total  expected  damage  caused  to  the  enemy’s  targets  using  a  finite  number 
of  weapons.  Optimally  assigning  interceptors  to  targets  is  a  subject  that  has  become 
more  critical  with  the  increase  in  the  technological  sophistication  of  adversaries,  and 
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the  potential  proliferation  of  intercontinental  ballistic  missiles  (ICBMs).  The  WTA 
problem  is  known  to  be  NP-complete  [60]. 

In  general,  two  cases  of  the  WTA  problem  are  considered,  static  and  dynamic.  The 
static  case  allocates  m  weapons  to  n  targets  at  one  time  given  all  problem  information 
is  known.  The  dynamic  case  provides  an  allocation  policy  over  some  time  horizon,  for 
which  more  information  may  arrive  as  time  progresses.  Generally,  both  formulations 
contain  at  least  stochastic  single  shot  kill  probabilities  for  weapon-target  pairs,  and 
many  include  additional  uncertainties. 

One  example  of  a  dynamic  problem  is  as  follows.  Suppose  there  are  two  waves 
of  incoming  targets  where  the  number  of  targets  ,  n,  and  their  values,  Vj  for  j  = 
1,2, ,  n,  in  the  first  wave  is  known  for  j  —  1.2,...,  ii  and  the  second  wave  is  known 
only  up  to  a  probability  distribution.  If  the  single  shot  probability  of  the  weapon 
(interceptor)  successfully  hitting  a  target  is  p  and  each  shot’s  outcome  is  independent 
of  the  outcome  of  any  other  shot,  then  the  decision  space  for  a  fixed  number  of  in¬ 
terceptors  consists  of  how  many  interceptors  to  allocate  to  the  first  wave  verses  the 
number  of  inceptors  allocated  for  assignment  to  the  second  wave.  This  formulation  is 
attributed  to  Murphey  [71]  who  proposes  a  stochastic  decomposition  approximation 
technique.  Ahner  and  Parson  [4]  provide  an  effective  algorithm  which  provides  opti¬ 
mal  solutions  for  the  problem  discussed  above  given  a  fixed  number  of  homogenous 
weapons.  This  research  extends  the  work  of  Ahner  and  Parson  [4]  by  incorporating 
second  stage  target  dependency  on  the  first  stage  outcomes.  Additionally,  weapon 
capabilities  vary  across  stages.  The  remainder  of  the  chapter  is  structured  as  follows. 
A  review  of  literature  is  given  in  Section  4.3  followed  by  the  formal  statement  of 
the  problem  in  Section  4.4.  Next,  the  proposed  solution  methodology  is  covered  in 
Section  4.5,  followed  by  numeric  results  in  Section  4.6.  Finally,  concluding  remarks 
and  discussion  of  future  research  resides  in  Section  5.6. 
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4.3  Literature  Review 


The  weapon-target  assignment  problem  is  a  well  studied  problem  with  various 
sub-formulations.  This  section  presents  a  review  of  relevant  literature  for  three  of 
these  formulations,  beginning  with  the  static  weapon-target  assignment  problem. 

4.3.1  Static  Weapon- Target  Assignment. 

Much  of  the  literature  has  been  dedicated  to  the  SWTA  problem  formulation, 
and  several  papers  have  been  developed  since  the  2006  survey  by  Cai  et  al.  [42],  As 
with  many  NP-complete  or  other  combinatorial  optimization  problems,  the  existing 
literature  applies  a  wide  variety  of  methods  to  effectively  solve  the  problem.  Ahuja 
et  al.  [5]  present  the  most  cited  results  from  recent  times  and  give  a  benchmark 
for  solution  quality  through  lower  bounding  (for  the  minimization  problem)  tech¬ 
niques.  Their  formulation  uses  integer  linear  programming  and  as  a  general  integer 
network  flow  problem  using  a  minimum  cost  flow  to  determine  a  new  lower  bound  (if 
minimizing).  The  authors  also  provide  a  very  large-scale  neighborhood  improvement 
heuristic  algorithm  which  quickly  solves  moderately  sized  instances  (up  to  80  weapons 
and  targets)  optimally  while  providing  high-quality  solutions  for  larger  problems  (up 
to  200  weapons  and  targets).  As  previously  discussed,  the  earliest  optimal  meth¬ 
ods  were  presented  by  denBroeder  [30]  under  a  homogenous  weapon  set  assumption. 
His  method  is  generally  known  as  the  maximum  marginal  return  (MMR)  algorithm 
(when  considering  the  maximization  problem)  and  assigns  weapons  sequentially  to 
the  weapon  with  the  highest  remaining  value  until  all  weapons  have  been  allocated. 
This  greedy  method  is  also  a  fast  method  for  bounding  of  the  solution  when  the  ho¬ 
mogeneous  weapons  assumption  has  been  relaxed.  Chang  et  al.  [24],  and  Orlin  [74] 
developed  optimal  methods  under  the  assumption  that  each  target  can  have  no  more 
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than  one  weapon  assigned  to  it.  These  methods  exploit  the  underlying  network  flow 
structure  of  the  SWTA  problem. 

Since  the  first  approximation  technique  for  the  SWTA  was  done  in  1966  [28],  a 
gamut  of  popular  metaheuristics  have  been  applied  to  the  SWTA  problem.  This  in¬ 
cludes  ant  colony  optimization  (AGO)  [57]  [88],  particle  swarm  [34]  [104]  (of  a  slightly 
more  generalized  resource  allocation  problem),  and  genetic  algorithms  (GAs)  [19]  [58] 

[49]  [61].  In  addition,  hybrid  methods  are  used  to  provide  solutions  for  the  SWTA,  to 
include  AGO  with  SA  [97],  GA  with  AGO  [33],  GA  using  greedy  eugenics  to  improve 
the  quality  of  the  offspring  [59] ,  and  particle  swarm  with  embedded  greedy  algorithms 

[50] .  [95]  provides  a  comparison  of  several  heuristic  algorithms  for  the  WTA  problem 
and  poses  a  new  hybrid  algorithm  consisting  of  particle  swarm  and  random  search  to 
produce  higher-quality  solutions.  In  addition  to  these  popular  metaheuristic  meth¬ 
ods,  several  other  approximation  methods  have  been  used  for  the  SWTA.  [26]  uses 
a  modified  MMR  type  algorithm  after  changing  the  network  representation  from  a 
one-to-many  to  a  one-to-one  mapping  to  efficiently  approximate  the  optimal  value. 
Rosenberger  et  al.  [85]  compares  the  sequential  application  of  the  auction  algorithm 
in  a  greedy  fashion  to  an  exact  (but  computationally  expensive)  branching  and  bound¬ 
ing  technique.  [62]  applies  fuzzy  reasoning  to  approximate  optimum  allocations  in 
real-time  for  use  on  a  battlefield.  Lastly,  Lagrangian  relaxation  [72]  was  used  to  de¬ 
compose  the  problem  into  two  tractable  subproblems  while  iteratively  updating  the 
Lagrange  multipliers.  Though  an  extensive  amount  of  research  has  been  done  into 
effectively  providing  high-quality  solutions  for  the  SWTA,  none  distinctly  stand  out 
as  the  best.  Next,  a  more  complex  dynamic  weapon  target  assignment  formulation  is 
presented,  prior  to  providing  a  review  of  existing  literature. 
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4.3.2  Dynamic  Weapon- Target  Assignment. 


Though  it  has  not  been  researched  to  the  extent  of  the  SWTA  problem,  the 
DWTA  problem  provides  a  more  practical  implementation  by  considering  the  impact 
current  decisions  have  on  future  states.  However,  by  breaking  the  problem  up  into 
several  decision  epochs,  the  DWTA  is  a  much  more  complex  problem.  Similar  to 
the  SWTA,  numerous  methods  have  been  employed  to  provide  solutions  for  various 
types  of  DWTA  problems.  As  the  originator  of  the  dynamic  instance,  Hosein  [47] 
provides  several  results  which  are  generalizable  to  the  DWTA  problem.  Additionally, 
Castanon  [20]  and  others  at  ALPHA  TECH  were  developing  advanced  algorithms 
for  the  DWTA  in  parallel.  Murphey  [70]  [71]  uses  stochastic  decomposition  for  a 
slightly  different  two-stage  problem.  Chang  [24]  uses  a  static  WTA  approximation 
scheme  within  an  iterative  linear  network  flow  framework  to  efficiently  provide  high- 
quality  solutions  for  the  DWTA.  Because  of  the  integer  restriction  for  the  decision 
variables,  the  chromosome  representation  within  a  GA  presents  a  useful  scheme  for 
solving  both  the  static  and  dynamic  versions  of  the  WTA  problem.  As  such,  much 
work  has  developed  hybrid  GAs  to  assist  in  solving  the  DWTA.  Wu  et  al.  [99]  apply  a 
modified  GA  to  the  DWTA  and  introduces  weapon  use  deadlines  within  the  problem 
formulation.  Xin  et  al.  [101]  develop  a  heuristic  which  uses  problem  information 
(domain  knowledge)  and  constraint  programming  to  assign  priorities  to  assignments. 
Evolutionary  heuristics  which  use  a  hybridized  GA  with  memetic  algorithms  have 
also  been  applied  to  the  DWTA  [25].  Additionally,  Khosla  [54]  applies  a  hybrid 
heuristic  which  uses  a  simulated  annealing  (SA)  type  heuristic  to  determine  the  fitness 
of  a  population  within  a  GA  framework.  Other  heuristic  techniques  applied  to  the 
DWTA  include  Tabu  Search  [102],  ACO  with  tabu  table  updates  [103],  and  a  modified 
Hungarian  method  with  PSO  [56]  (though  this  is  in  an  open  source  text,  so  it’s  rigor 
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may  be  unverified).  Lastly,  exact  dynamic  programming  [89]  [91]  has  also  been  applied 
to  the  DWTA. 

4.3.3  Shoot-Look-Shoot. 

The  shoot-look-shoot  class  of  problem  is  generally  found  in  naval  literature.  Manor 
and  Kress  [64]  provides  optimality  of  a  multi-stage  greedy  SLS  solution  assuming 
imperfect  damage  information.  They  also  show  that  the  original  SLS  problem  is 
equivalent  to  a  finite  horizon  deteriorating  bandit  problem,  which  dynamically  allo¬ 
cates  a  single  resource  amongst  a  fixed  number  of  arms.  Aviv  and  Kress  [8]  evaluate 
several  SLS  tactics  (such  as  the  persistent  shooter,  fixed  bound  on  munitions  and  dy¬ 
namic  bound  on  munitions)  and  analyzes  their  efficiency  when  damage  information 
is  uncertain  (or  incomplete).  Glazebrook  and  Washburn  [35]  provide  a  brief  survey 
of  the  SLS  problem,  and  further  investigate  it  by  considering  several  scenarios  in 
which  information  may  be  perfect  or  imperfect,  the  time  horizon  is  finite  or  infinite, 
and  homogeneity  (or  non-homogeneity)  of  weapons  is  considered.  They  approach 
the  problem  as  a  partially  observable  Markov  decision  process  (POMDP),  and  apply 
dynamic  programming  citing  the  computational  intractability  of  their  methods  as 
problem  size  increases.  Yost  and  Washburn  [105]  also  decompose  the  problem  into  a 
linear  program  to  obtain  an  initial  (bound)  set  of  policies  and  dynamic  programming 
to  help  improve  the  policies.  The  dynamic  programming  subproblem  is  also  viewed 
as  a  POMDP,  as  in  [35].  Karasakal  [51]  applies  integer  programming  decomposition 
to  determine  SLS  policies  for  allocating  surface-to-air  missiles  within  a  naval  task 
group.  Castanon  [23]  approaches  the  SLS  problem  as  a  two  stage  resource  allocation 
where  the  goal  is  to  maximize  the  first  stage  allocations  while  considering  the  second 
stage  recourse  requirements.  The  formulation  then  takes  on  a  similar  form  to  that 
of  the  two-stage  stochastic  control  problem  defined  by  Murphey  [71],  and  also  looks 
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very  much  like  a  constrained  two-stage  form  of  Bellman’s  equation  [10].  Linear  inter¬ 
polation  and  Lagrangian  decomposition  are  then  used  to  determine  optimal  recourse 
actions  for  the  2nd  stage.  These  values  are  then  used  recursively  to  greedily  determine 
an  approximate  solution  of  the  first  stage  problem. 

4.4  Problem  Formulation 

Because  it  forms  the  basis  from  which  the  formulation  is  developed,  the  generalized 
static  weapon  target  assignment  (SWTA)  is  first  introduced. 

4.4.1  Static  Weapon- Target  Assignment. 

The  SWTA  is  formulated  as  follows.  Let  Vj  denote  the  value  of  the  jth  target,  Wt 
denote  the  number  of  available  weapons  of  type  i.  It  is  assumed  that  there  are  m 
weapon  types  and  n  targets.  Let  pVJ  be  the  single  shot  probability  that  a  weapon  of 
type  i  will  kill  a  target  of  type  j,  such  that  the  single  shot  probability  of  survival  is 
qij  =  1  —  pij.  The  decision  variable  xl3  is  the  number  of  weapons  of  type  i  assigned 
to  target  j.  The  SWTA  problem  is  then  formulated  as  a  nonlinear  integer  program: 

n  m 

min  XVid!  Qijj)  (4-1) 

j= i  *=i 

subject  to 


Xij  <  Wi  for  all  i  —  1,  2, . . . ,  m,  (4.2) 

3= 1 

>  0  and  integer,  for  all  i  =  1,2 , ,m,j  =  1,2 , ,n.  (4.3) 

Much  of  the  WTA  literature  has  been  dedicated  to  the  SWTA  problem  formulation 
which  was  shown  to  be  NP-complete  in  1986  by  Lloyd  and  Witsenhausen  [60].  As 
such,  a  great  deal  of  research  has  been  done  in  the  past  several  decades  to  determine 
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effective  methods  of  identifying  optimal  solutions.  Computationally  efficient  optimal 
methods  exist  for  two  cases  of  the  SWTA  under  simplifying  assumptions.  First, 
given  a  homogeneous  weapon  set,  pij  =  pj  for  all  i,  denBroeder  [30]  shows  optimality 
is  achieved  by  evenly  distributing  the  weapons  across  as  many  targets  as  possible 
using  the  maximum  marginal  return  (MMR)  algorithm.  The  second  instance  assumes 
that  each  target  can  have  at  most  one  weapon  assigned  to  it  [24]  [75].  Because  this 
problem  focuses  on  a  special  instance  of  the  dynamic  WTA  (DWTA)  problem,  relevant 
literature  from  this  class  is  reviewed. 

4.4.2  Two-Stage  Dynamic  Weapon-Target  Assignment. 

Consider  a  problem  where  a  single  wave  of  targets  arrives,  and  instead  of  allocat¬ 
ing  all  weapons  at  once,  it  is  done  over  two  stages.  Weapon  capabilities  are  stage 
dependent,  meaning  that  kill  probabilities  for  stage  one  and  stage  two  differ.  Next 
assume  the  outcome  of  any  stage  one  weapon  allocations  are  determined  prior  to 
allocating  additional  resources  in  stage  two.  This  problem  is  formulated  as  follows. 
Let  N  targets  arrive  in  the  first  stage  at  which  point  X\j  shots  are  allocated  to  target 
j,  for  all  j  =  1, 2, . . . ,  N.  Let  12  be  the  set  of  all  possible  outcomes  of  the  stage  one 
allocations,  then  oj  G  12  denotes  a  sample  realization.  This  formulation  allows  for 
the  single  shot  probabilities  of  survival,  (pj ,  t  =  1,2  to  vary  by  stage,  representing 
non  homogeneity  of  weapons.  This  may  be  interpreted  as  two  types  of  weapons, 
two  weapon  locations,  or  changing  capabilities  as  time  goes  on.  Let  J\  be  the  set 
of  targets  for  which  weapons  are  allocated  in  stage  one,  target  values,  Vtj,  t  =  1,2 
transition  probabilistically  for  each  target  j  G  J\  from  stage  one  to  two.  Let  n2  be 
the  number  of  remaining  targets  after  the  stage  one  outcome  has  been  determined. 
Next,  x2j  weapons  are  allocated  to  each  target  j  G  J2{oj)  where  J2(oj)  is  the  set  of 
targets  in  stage  two,  which  depends  on  the  outcome  c o.  Let  C  be  the  resource  pool 
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from  which  weapons  are  selected.  Additionally,  there  are  rn \  weapons  of  type  1  and 
m2  weapons  of  type  2.  Then  the  2-stage  DWTA  programming  formulation  is: 


N 

Zi(x)  =  max  Vjj(l  -  (qij)Xlj)  +  Ewen[Z2(x2,  w)]  (4.4) 

3= 1 

where 


n  2 

Z2(x2,o;)  =  maxVh2iH(l  -  (q-2j)X2j)  (4.5) 

X2  Z - ' 

J  =  1 

subject  to 


xi  +  x2  <  C 

(4.6) 

X\  <  m\ 

(4.7) 

x2  <  m2 

(4.8) 

xtj  e  N;t  =  1,2;  j  =  1, . . .  ,N; 

(4.9) 

Z2(x2,co)  is  the  expected  second  stage  value  and,  given  a  number  of  second  stage 
weapons,  m2,  and  a  sample  outcome  of  remaining  targets,  c a,  is  piecewise  integer 
concave  (for  a  proof  of  this,  see  [2])  and  is  solved  using  the  MMR  algorithm.  Con¬ 
straint  (4.6)  is  the  resource  constraint,  constraints  (4.7)  and  (4.8)  are  the  capacity 
constraints,  and  constraint  (4.9)  is  the  integrality  constraint  on  the  decision  variables. 

4.5  Methodology 

Because  of  the  special  structure  of  this  stochastic  program,  the  work  of  [4]  is  ex¬ 
tended  to  consider  the  case  where  a  second  stage  approximation  of  the  value  function 
is  used  to  allow  for  efficient  solutions  of  the  overall  problem.  This  method  uses  Monte 
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Carlo  sampling  of  the  first  stage  outcomes  to  approximate  the  stage  two  value  func¬ 
tion.  Because  of  the  concavity  of  the  stage  two  function  the  proposed  algorithm  is 
able  to  quickly  determine  an  approximation  of  the  tradeoff  between  using  weapons  in 
the  first  stage  and  using  weapons  in  the  second  stage.  Because  the  second  stage  uti¬ 
lizes  a  single  resource  class,  the  optimality  of  the  MMR  algorithm  of  denBroeder  [30] 
is  exploited  to  generate  optimal  allocations  for  any  sampled  u  €  Q.  This  is  further 
used  to  efficiently  generate  high  quality  solutions  using  my  less  computational  time 
required  to  enumerate  all  possible  first  stage  outcomes. 

4.5.1  Adaptive  Dynamic  Programming. 

Consider  a  general  finite  space  and  discrete  time  horizon  dynamic  programming 
problem.  Let  S  be  the  state  space  of  the  system  with  time  horizon  t  =  0, . . . ,  T. 
The  state  St  G  S  represents  the  state  of  the  system  at  time  t,  and  a  decision  xt  that 
acts  on  the  system  is  selected  from  a  finite  set  U  at  each  time  step.  Wt  is  a  random 
occurrence  generated  with  a  known  probability  distribution  and  the  system  evolves 
according  to  a  transition  function  which  has  the  form 

St+i  =  fi(St,xt,Wt)  (4.10) 

where  /i(-)  is  a  function  describing  the  system  dynamics.  Next,  define  the  one-period 
contribution  for  being  in  state  St  and  making  decision  xt  as  Ct(St,xt)  and  express 
the  T-stage  value  to  be  maximized  as  the  expected  value  of  the  summation  of  the  T 
costs: 


max  E  { V'C't(S't,xt)|S'o}  (4.11) 

XteU{St)  [tZ  J 
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It  is  well  known  that  problems  of  the  form  given  in  (4.11)  can  be  solved  by  Bell¬ 
man’s  optimality  equations  [12]: 

Jt(St)  =  max(Ct(St,xt)  +  EWt  {Jt+i(St+1(St,xt,Wt))\St}  (4.12) 

Xt 

Problems  of  this  type  grow  exponentially  within  the  state,  decision,  and  outcome 
spaces  -  known  as  the  curses  of  dimensionality.  Therefore  it  is  necessary  to  approx¬ 
imate  the  value  function  Adaptive  Dynamic  Programming  provides  a 

means  for  stepping  forward  through  time  iteratively  using  sample  realizations  of  the 
approximated  value  function. 

4.5.2  Approximation  of  the  Second  Stage  Value  Function. 

For  any  first  stage  weapon  allocation,  the  number  of  outcomes  grows  exponentially 
in  the  number  of  weapons  allocated  and  targets  with  weapons  allocated  to  them. 
However,  because  of  its  concavity,  estimates  of  the  second  stage  value  function  are  able 
to  be  generated  by  sampling  outcomes  of  the  first  stage.  The  concave  adaptive  value 
estimation  (CAVE)  algorithm  of  Godfrey  and  Powell  [38]  provides  such  a  method 
for  approximation.  CAVE  uses  stochastic  subgradient  information  representing  the 
marginal  value  for  saving  enough  resources  to  use  an  additional  weapon  in  stage  two. 
The  CAVE  algorithm  is  shown  in  Algorithm  1. 

4.5.3  Adaptive  Dynamic  Programming  for  a  Two-Stage  DWTA. 

The  CAVE  algorithm  is  implemented  within  the  MMRPlus  algorithm  of  [4]  to 
efficiently  generate  high  quality  solutions  to  the  problem  defined  in  (4.4)  -  (4.9). 
Powell’s  post-decision  state  notation  [83]  is  adopted  and  the  second  stage  post  decision 
value  function  is  defined  as 
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Algorithm  1  Concave  Adaptive  Value  Estimation  (CAVE)  Algorithm  [38] 

STEP  1  Initialization 

•  let  K  —  {0},  where  v°  =  0 ,u°  —  0. 

•  set  £~,  e+,  a. 

STEP  2  Collect  Gradient  Information 

•  Given  a  state  s  >  0,  sample  the  gradients  7r~(s,u)  and  7T+(s,u;)  with  random 
outcome  oj  E  hi 

STEP  3  Define  Smoothing  Interval 

•  Let  k~  =  ruin  (A;  e  /C  :  vk  <  (1  —  a)vk+1  +  o;7r_(s)}  and  k+  =  max{k  G  /C  : 
(1  —  a)vk~l  +  o;7r+(s)}  <  vk 

•  Define  the  smoothing  interval  Q  =  min{s  —  £~,  uk  },  max{s  +  £+,  . 

•  Create  new  breakpoints  at  s  and  the  endpoints  of  Q 
STEP  4  Perform  Smoothing 

•  For  each  segment  in  Q,  vkew  =  ari:  +  (1  —  a)vkld  where  n  =  vr“(s)  if  uk  <  s 
and  7i  =  7r+(s)  otherwise. 

•  Adjust  £~:£+:a  according  to  step  size  rules. 

•  Return  to  Step  2. 
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Jj'(Sf)  =  E^nlZjte.w)] 


(4.13) 


The  post  decision  state  is  then  S*  =  n 2,  and  for  any  given  number  of  weapons,  the 
slopes  produced  by  CAVE  represent  the  marginal  value  of  reserving  a  weapon  for  the 
second  stage.  The  MMRPlus  algorithm  is  shown  in  Algorithm  2. 


Algorithm  2  MMRPlus  Algorithm  [4] 


STEP  0  Initialization  -  Given  Jf  (Sf) 

•  Xj  —  0  Vj  =  1, . . . ,  N  and  set  xn+i  =  x^ 


•  Set  Sj  =  Vj  for  j  =  1, . . . ,  N. 

•  Compute  the  marginal  returns  MRj  =  Sj ( 1  —  qj),  MR^+i 

Vj- 

•  Initialize  weapon  index  i  —  1. 

while  i  <  M  do 


Jf(l)  -  J f(0) 


•  Find  target  k  for  which  weapon  i  has  the  greatest  effect,  compute  k  = 
arg  maXj^.^Tv+i  M Rj 

•  Increment  the  allocation  to  target  k\  x^  Xk  +  1 

if  j  <  A  then 

Update  the  expected  surviving  value  Sj  =  S^g^.,  and  update  the  marginal 
return  MRk  =  Sk(  1  -  qk) 
else 

Increment  Xn+ 1  a;iv+i  +  1  and  update  the  marginal  return  MR^+i  = 
Jf(xN+i  +  1)  -  Jf  (xjv+i) 

end  if 

Set  i  —  i  +  1 

end  while 


Using  MMRPlus, the  optimal  allocation  for  stage  one  weapons  is  determined  using 
the  second  stage  approximation  from  the  Monte  Carlo  experimentation.  Given  a 
realization  of  second  stage  target  arrivals  u  and  the  current  solution  of  the  MMRPlus 
algorithm  where  x2  =  C  —  Xi  weapons  are  allocated  to  the  second  stage,  the  left  and 
right  derivatives,  v~(co)  and  u+(o;)  respectively,  are  calculated  as: 
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v  (x2 ,cj)  = 
v+(x2,cj)  = 


{Z(x2,u)  -  Z(x2  -  1,uj),  if  x2  ^  0 

0  otherwise 

Z(x2  +  1,uj)  -  Z(x2,u) 


(4.14) 

(4.15) 


where  Z(x 2,co)  is  the  solution  by  the  MMR  algorithm  of  the  second  stage  problem 
given  x2  weapons  and  sample  realization  c o.  This  ensures  that  any  excess  resources 
after  the  first  stage  allocation  are  used  in  the  second  stage.  The  algorithm  is  presented 
as  Algorithm  3.  Note  that  the  second  stage  probability  function  is  dependent  on 
the  first  stage  allocation.  Therefore,  Equation  (4.4)  is  not  necessarily  concave,  and 
optimality  is  not  guaranteed  as  in  [4], 


Algorithm  3  Adaptive  Dynamic  Programming  Algorithm  for  2  Stage  DWTA 

Initialize:  x\  =  mi,  set  e~,e+,a, 

Initialize:  v~  =  v+  =  0,  ry  =  0  for  i  =  1, . . .  mi, 

Set:  a  =  1,  and  fix  iterations. 
while  a  <  iterations  do 

Determine  optimal  assignment  of  x\  using  MMR  algorithm 
Using  the  assignment  of  X\ ,  generate  outcome  u  G  D  using  Monte  Carlo 
sampling 

Set  x2  =  C  —  x ! 

if  n2  >  0  then, 

Determine  optimal  assignment  of  x2  using  MMRPlus  -  call  it  V 
Determine  optimal  assignment  for  x2  —  1  and  x2  +  1  using  MMRPlus  -  call 
them  and  J+,  respectively 

v~  =  J-J~,v+  =  J+-J 
else 

v~  =  v+  =  0 

end  if 

Update  Vi  for  i  —  1, . . .  mi  using  CAVE  algorithm 
a  —  a  +  1,  update  e~ ,  e+,  a 

end  while 
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4.6  Numeric  Results  and  Discussion 


Initially  small  scale  experiments  are  run  so  that  exact  solutions  can  be  determined 
for  comparison  of  the  proposed  method.  Two  approximations  are  used  as  benchmarks 
for  further  comparison.  These  benchmark  approximations  are  used  because  because 
of  their  computational  simplicity  as  well  as  their  ability  to  provide  quality  solutions 
for  comparison.  All  experiments  discussed  herein  use  Matlab  2013a  on  a  3.07  GHz 
Intel  Xeon  with  24  GB  RAM. 

4.6.1  Small  scale  experiments. 

To  test  the  algorithm,  100  problem  instances  are  randomly  generated  as  follows. 
Integer  target  values  are  randomly  generated,  ranging  from  one  to  ten,  and  N  is 
fixed.  Survival  probabilities  are  independently  and  randomly  selected  with  qtj  ~ 
UNIF(0.1,  0.4)  for  t  —  1,2  and  j  =  1,2 ,N.  For  the  initial  set  of  experiments, 
M  is  fixed  at  seven,  N  is  varied  at  seven  and  eleven  and  the  values  computed.  Two 
approximation  schemes  are  selected  for  comparison  of  the  proposed  method.  First,  a 
greedy  approximation  developed  by  Castanon  and  Wohletz  [22]  is  used  that  proves 
very  effective  for  small  scale  tests,  but  suffers  extensive  computation  time  for  larger 
problems.  This  algorithm  is  presented  as  Algorithm  4. 

In  order  to  describe  the  greedy  approximation  of  [22],  several  items  must  be  de¬ 
fined.  Define  xi  =  (in,  Xu, . . . ,  X\n)  and  let  x^  =  (in  . . .  x\j  X\3  + 1  xiy+i) . . .  X\n)- 
Let  0  =  {0, 1}^  denote  the  outcome  space  for  a  given  allocation  where  u jj  =  0  denotes 
that  target  j  has  been  destroyed,  and  uij  =  1  denotes  target  j  survives.  Given  a  stage 
one  allocation  x1;  then 

j>|xi)=  n  a-a-pu)11')*  n  (i-pij)*1'  <4-16) 

{j\uj=  0}  {j\ujj=l} 
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In  addition,  define  V(cn)  =  =0  Vy,  and 


N 

J(Xl)  =  V(o;)P(a;|x1)J2*(a;,M  -  (4.17) 

cjGH  i=l 

The  greedy  algorithm  is  then 


Algorithm  4  2- stage  greedy  WTA  algorithm  [22] 


Initialize:  X\j  =  0,  for  i  =  1,  2, . . .  A. 
while  xy  <  M  do 

For  each  j,  compute  MRj(x.i)  =  J(Xl)  —  J(x^) 

Select  j*  for  which  Mi7,*(x i)  <  M Rj (x| )  for  all  j  7^  j*. 
if  MRj*  <  0  then 
Set  Xj*  =  Xj*  +  1 
else  Break 
end  if 
end  while 


Because  of  the  second  stage  dependency  on  the  first  stage  outcome,  Algorithm  4  is 
not  optimal.  However,  it  has  been  shown  to  provide  optimal  solutions  to  randomly 
generated  problems  of  smaller  size  [22],  and  is  used  to  provide  a  metric  for  larger 
sized  problems  due  to  its  relative  computational  tractability.  In  the  second  approx¬ 
imation,  denoted  MMR  in  the  results  tables,  all  possible  combinations  of  X\  and  X2 
are  generated.  MMR  is  then  run  for  each  stage  on  every  possible  x  =  and 

the  x  which  maximizes  the  sum  of  the  two  stage  expected  value  is  selected.  Since 
it  considers  all  possible  outcomes,  the  exact  expected  target  destruction  value  is  re¬ 
ported  for  the  CW  heuristic.  Because  of  the  dependence  on  first  stage  outcomes,  1000 
monte  carlo  simulations  are  run  to  determine  the  expected  value  for  the  MMR  and 
ADP  policies.  The  expected  value  for  the  MMR  method  is  also  presented  for  further 
validation  Where  appropriate,  common  random  numbers  were  used  to  reduce  experi¬ 
mental  variation.  The  average  optimality  gap  and  associated  standard  deviations  for 
the  100  experiments,  are  presented  in  Table  1.  Additionally,  computation  time  for 
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each  method  is  reported  in  Table  2.  The  results  of  the  first  10  experiments  for  each 
problem  size  are  shown  in  Figure  2. 

Table  1.  Optimality  gap  (%)  for  100  randomly  generated  problem  instances 


M 

N 

CW  Heur. 
%Diff 

ADP 

%Diff 

MMR  Sim 
%  Diff 

5 

5 

0.0 

0.087  ±1.6 

6.44  ±4.55 

5 

10 

0.0 

0.074  ±3.7 

1.56  ±2.24 

10 

5 

0.0 

1.07  ±0.093 

2.76  ±1.86 

10 

10 

0.0 

1.13  ±1.21 

7.36  ±4.53 

Table  2.  Computation  time  (seconds)  for  100  randomly  generated  problem  instances 


M 

N 

CW  Heur 

ADP 

MMR 

5 

5 

0.0058  ±0.0038 

0.0815  ±.0214 

0.0047  ±0.0112 

5 

10 

0.0116  ±0.0064 

0.0740  ±.0086 

0.0034  ±0.0005 

10 

5 

0.0192  ±0.0045 

0.0779  ±0.0117 

0.0072  ±0.0003 

10 

10 

0.1020±0.0586 

0.1044  ±0.0072 

0.0073  ±0.0009 

For  this  set  of  small  scale  experiments,  solving  exactly  or  using  the  first  approximation 
methods  is  preferable.  However,  the  value  obtained  through  the  ADP  algorithm 
is  very  competitive,  and  the  strength  of  the  ADP  method  comes  as  problem  size 
increases. 

The  next  set  of  experiments  varies  the  number  of  weapons  and  targets  between 
10  and  20  to  determine  the  effectiveness  of  the  method  on  slightly  larger  problem 
sizes.  The  CW  heuristic  and  the  two-stage  MMR  approximation  remain  the  principal 
benchmarks.  For  these  experiments,  50  test  problems  are  randomly  generated  using 
the  same  parameters  as  above  with  1000  simulations  run  on  the  solutions.  Since  the 
simulated  MMR  results  provide  a  sufficient  estimate  of  the  approximation,  they  are 
reported  for  this  analysis.  The  average  percent  difference  from  the  CW  heuristic  for 
the  ADP  and  MMR  methods,  are  presented  in  Table  3,  with  computation  times  in 
Table  5. 

The  CW  Heuristic  was  computed  in  a  reasonable  amount  of  time  for  problems 
with  less  than  40  weapons  or  targets.  However,  these  problems  take  several  minutes 
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(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  1-10 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  1-10 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  2.  Results  for  first  10  small  sized  experiments  at  varying  W  &:  T 


Table  3.  Gap  from  CW  Heur  for  50  randomly  generated  medium  sized  problems 


Weapons 

Targets 

Gap(%)  ± 
ADP 

(std  dev) 
MMR 

10 

20 

0.55  ±2.91 

2.02  ±2.09 

20 

10 

0.8  ±0.89 

3.12  ±2.07 

20 

20 

0.87  ±0.97 

9.78  ±4.17 
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Value  Comparison  for  W  =  10,  T  =  20,  Problems  1-10 


Value  Comparison  for  W  =  20,  T  =  10,  Problems  1-10 


30  - 
20 


|  CW  Heur 
3  ADP 
|  MMR  sim 


|CW  Heur 
3  ADP 
I  MMR  sim 


SI 


5 


S 


1 


1  2  3  4  5  6  7 

Problem  Number 


1  2  3  4  5  6  7 

Problem  Number 


(a)  W  =  10,  T  =  20  (b)  W  =  20,  T  =  10 

Value  Comparison  for  W  =  20,  T  =  20,  Problems  1-10 


|  CW  Heur 
3  ADP 
|  MMR  sim 


m 


i 


1  2  3  4  5  6 

Problem  Number 


(c)  W  =  20,  T  =  20 

Figure  3.  Results  for  first  10  medium  sized  experiments  at  varying  W  &;  T 
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each  to  solve,  and  in  some  cases,  storage  is  a  constraining  factor.  Hence,  the  percent 
improvement  of  the  ADP  method  over  that  of  the  MMR  method  is  used  as  the  primary 
metric.  The  results  of  this  analysis  are  shown  in  Table  4. 

Table  4.  Percent  difference  of  ADP  over  MMR  for  50  randomly  generated  medium 
sized  problems 


Weapons 

Targets 

%  A  (ADP  -  MMR) 

10 

40 

-0.5  ±2.2 

20 

40 

0.39  ±  3.47 

40 

10 

-0.12  ±0.52 

40 

20 

2.63  ±2.66 

40 

40 

8.4  ±4.9 

Table  5.  Computational  results  for  100  randomly  generated  medium  sized  problems 


Comp  Time  (s)  (±  std  dev) 


Weapons 

Targets 

CW  Heur 

ADP 

MMR 

10 

20 

62.9445  ±  7.9483 

0.1255  ±  0.0213 

0.0082  ±  0.0008 

10 

40 

- 

0.1153  ±  0.0279 

0.0073  ±  0.0005 

20 

10 

54.4406  ±  9.4137 

0.1529  ±  0.017 

0.0233  ±  0.0017 

20 

20 

93.8378  ±  13.7356 

0.1938  ±  0.0152 

0.0245  ±  0.002 

20 

40 

- 

0.197  ±  0.0282 

0.0244  ±  0.0026 

40 

10 

- 

0.1906  ±  0.0125 

0.0715  ±  0.0079 

40 

20 

- 

0.246  ±  0.0218 

0.00714  ±  0.0032 

40 

40 

- 

0.3377  ±  0.0247 

0.0861  ±  0.0043 

Results  show  a  statistically  insignificant  difference  between  the  ADP  method  and 
MMR  when  the  number  of  weapons  is  far  less  than  the  number  of  targets.  This 
is  an  intuitive  result  because  with  few  weapons,  it  will  be  optimal  to  spread  them 
out  as  evenly  as  possible  over  the  highest  valued  targets.  This  suggests  that  any 
approximation  which  reinforces  this  principle  will  generate  very  similar  solutions. 
However,  as  the  number  of  weapons  increases  to  a  level  greater  than  or  equal  to  the 
number  of  targets,  the  ADP  outperforms  on  average.  As  further  evidence  of  this, 
confidence  intervals  around  the  difference  in  the  means  between  the  1000  monte  carlo 
simulations  were  developed.  Figures  4  and  5  demonstrate  the  significant  improvement 
gained  through  the  use  of  the  ADP  method  as  problem  size  increases,  when  there  are 
more  weapons  than  targets. 
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Mean  ADP  -  Mean  MMR  £  Mean  ADP  -  Mean  MMR 


Figure  4  shows  approximately  90%  of  the  randomly  generated  problem  instances 
showing  a  statistically  significant  increase  in  value,  while  Figure  5  shows  improvement 
96%  of  the  time.  Again,  because  both  methods  spread  weapons  across  the  highest 
expected  return,  it  is  unsurprising  that  when  the  targets  outnumber  the  weapons, 
both  methods  are  fairly  consistent.  Conversely,  when  the  number  of  weapons  is 
relatively  greater  than  the  number  of  targets,  the  ADP  substantially  improves  because 
it  is  accounting  for  the  future  value  gained  while  generating  first  stage  allocations. 
The  MMR  approximation  generally  pulls  weapons  to  one  stage  or  the  other  and 
allocates  them  fully,  but  the  ADP  method  tends  to  spread  weapons  across  stages, 
improving  the  likelihood  that  leakers  will  be  destroyed  in  stage  two.  This  is  evidenced 
in  the  improvement  for  cases  where  the  number  of  weapons  and  targets  are  equal. 
Computation  times  for  the  two  methods  are  both  less  than  a  second,  with  the  MMR 
method  generally  running  faster,  but  with  comparatively  poorer  performance.  The 
next  set  of  experiments  are  done  to  see  how  the  methods  perform  on  problems  of  a 
much  larger  size. 

4.6.2  Large  Scale  Experiments. 

For  the  large  scale  experiments,  50  randomly  generated  problem  instances  were  run 
using  the  same  parameters  as  in  Section  4.6.1.  For  this  set  of  experiments,  the  number 
of  weapons  and  targets  vary  between  100,  200  and  400.  Because  of  the  insignificant 
improvement  when  there  are  more  targets  than  weapons,  this  analysis  focuses  on  the 
cases  where  the  number  of  weapons  are  greater  or  equal  to  that  of  the  targets.  With 
problems  of  this  size,  solution  of  the  greedy  algorithm  becomes  intractable  due  to  the 
potential  size  of  the  outcome  space  as  the  algorithm  progresses.  Therefore,  the  MMR 
becomes  the  sole  benchmark  to  determine  solution  quality.  Simulations  are  run  on 
the  policies  of  each  method,  and  the  results  are  presented  in  Table  6. 
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Table  6.  Numerical  results  for  100  randomly  generated  large  sized  problems 


Weapons 

Targets 

%  Difference  of  ADP  vs.  MMR 

100 

100 

9.19  ±4.12% 

200 

100 

3.68  ±  2.75% 

200 

200 

9.06  ±5.11% 

400 

100 

0.22  ±  0.37% 

400 

200 

4.04  ±  2.57% 

400 

400 

9.65  ±  4.32% 

Table  7.  Computation  time  (seconds)  for  100  randomly  generated  large  sized  problems 


Weapons 

Targets 

ADP 

MMR 

100 

100 

1.2635  ±  0.0794 

1.983  ±  0.067 

200 

100 

1.3027  ±  0.893 

2.0743  ±  0.0711 

200 

200 

1.741  ±  0.203 

2.3069  ±  0.2406 

400 

100 

1.7233  ±  0.1198 

6.2772  ±  0.1323 

400 

200 

2.9153  ±  0.3817 

9.5193  ±  1.1262 

400 

400 

3.7753  ±  0.3781 

10.4585  ±  0.5697 

Results  are  consistent  with  the  findings  of  Section  4.6.1,  with  the  notable  increase 
in  performance  of  the  ADP  method.  As  problem  size  increases,  the  ADP  method 
continues  outperforming  the  two  stage  MMR.  Additionally,  computation  time  for  the 
proposed  method  is  much  more  competitive  as  problem  size  increases.  For  problems 
where  there  are  many  targets  coming  in  at  a  time,  this  provides  a  quick  approximation 
for  determining  the  number  of  weapons  to  save  for  a  second  stage.  Figures  6  and  7 
present  the  simulated  values  and  confidence  intervals  around  the  difference  in  the 
simulated  means  for  the  first  ten  problems  of  each  large  scale  case. 

The  black  lines  in  Figures  7a-7f  are  at  y  =  0.  Since  the  confidence  intervals 
consistently  above  this  line  means  that  the  null  hypothesis  that  the  difference  in  the 
means  is  zero  is  rejected  and  there  is  a  significant  difference.  This  is  generally  true  in 
all  cases  except  where  there  are  400  weapons  and  100  targets.  This  is  likely  due  to  the 
large  proportion  of  weapons  to  targets  and  the  defined  kill  probabilities.  The  ADP 
method  rarely  under  performs  comparatively,  and  even  when  it  does,  the  difference 
in  destroyed  target  value  is  very  small  practically  speaking.  The  speed  of  the  ADP 
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Value  Comparison  for  W  =  100,  T  =  100,  Problems  1-10 


(a)  W  =  100,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  100,  Problems  1-10 


(b)  W  =  200,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  200,  Problems  1-10 


(c)  W  =  200,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  100,  Problems  1-10 


(d)  W  =  400,  T  =  100 


Value  Comparison  for  W  =  400,  T  =  200,  Problems  1-10 


(e)  W  =  400,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  400,  Problems  1-10 


(f)  W  =  400,  T  =  400 


Figure  6. 


Results  for  first 


10  large  scale  experiments  at 


varying  W  T 
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algorithm,  however,  is  nearly  four  times  as  long  for  the  MMR  method  on  the  test 
problems,  suggesting  the  desirability  of  the  ADP  method. 

4.7  Conclusions  and  Future  Research 

This  research  develops  an  efficient  solution  algorithm  for  a  two-stage  shoot-look- 
shoot  scenario  where  the  second  stage  target  set  is  dependent  on  the  first  stage  al¬ 
locations.  Through  Monte  Carlo  simulation,  subgradients  of  the  second  stage  value 
function  are  approximated.  These  subgradients  are  then  used  to  get  an  approxima¬ 
tion  of  the  two-stage  value  for  all  first  stage  allocations.  This  method  has  been  shown 
to  be  competitive  with  established  techniques  for  small  to  medium  sized  problems, 
but  preferred  as  problem  size  increases.  The  CW  heuristic  is  able  to  address  problem 
instances  up  to  20  weapons  and  20  targets.  For  those  problems,  the  proposed  method 
obtained  values  within  1.2%  of  optimal  solutions  found  using  the  CW  heuristic.  For 
large  problems,  the  ADP  approach  consistently  outperformed  the  MMR  heuristic  by 
up  to  8.4%  for  small  problems  and  up  to  9.6%  for  larger  problem  instances.  The  ADP 
approach  in  [4]  and  further  developed  here  offers  significant  flexibility  to  be  extended 
to  numerous  other  problem  formulations.  First,  the  algorithm  can  be  extended  to  in¬ 
clude  the  impact  of  cost  on  the  approximation  scheme  as  well  as  the  effect  sensors  may 
have  in  the  first  stage,  second  stage,  or  both.  Additionally,  weapons  for  this  effort  are 
homogeneous  within  a  stage,  so  a  natural  extension  will  investigate  non-homogeneous 
weapons  in  and  across  stages.  Last,  because  the  subgradients  represent  the  marginal 
increase  in  reserving  a  weapon  for  future  stages,  the  algorithm  may  be  very  effective 
in  instances  where  there  are  more  than  two  stages.  Therefore,  additional  research 
may  extend  this  to  multiple  stages. 
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V.  Approximate  Dynamic  Programming  Methods  for  a 
Cooperative  Dynamic  Weapon- Target  Assignment  Problem 


5.1  Abstract 

The  dynamic  weapon  target  assignment  (DWTA)  problem  is  an  extension  of  the 
static  weapon-target  assignment  problem  in  which  assignments  are  made  over  time, 
instead  of  all  at  once.  This  research  investigates  a  cooperative  version  of  the  DWTA 
problem  where  the  single-shot  probability  of  kill  is  conditional  upon  the  current  tar¬ 
get  set.  The  sequential  nature  of  the  problem  lends  itself  to  solution  by  dynamic 
programming.  However,  because  of  the  curses  of  dimensionality,  large  problems  often 
become  computationally  intractable.  An  approximation  method  is  proposed  which 
reduces  the  size  of  the  decision  space  to  be  investigated.  Through  ordinal  optimiza¬ 
tion  a  rigorous  method  for  ensuring  selection  of  high-quality  decisions  from  the  action 
space  for  any  given  state  is  demonstrated.  Various  distributions  are  investigated,  and 
results  show  that  near  optimal  solutions  can  be  obtained  in  much  less  computation 
time. 

5.2  Introduction 

The  weapon-target  assignment  (WTA)  problem  is  a  fundamental  resource  alloca¬ 
tion  problem  in  the  field  of  military  operations  research  where  the  goal  is  to  assign 
weapons  to  targets  such  that  some  objective  is  optimized  based  on  the  number  of 
targets  destroyed.  Because  of  its  applicability  to  numerous  issues  facing  military  an¬ 
alysts,  such  as  ballistic  missile  defense,  air-to-ground  operations,  and  integrated  air 
defense  systems  (IADS),  this  problem  continues  to  be  of  significant  operational  im¬ 
portance.  Additionally,  because  of  the  complexity  of  the  various  formulations,  the 
WTA  problem  also  maintains  significance  in  the  theoretic  realm.  Though  it  is  also 
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found  under  other  names,  two  general  types  of  WTA  problem  exist:  static  (SWTA) 
and  dynamic  (DWTA).  In  both  forms,  there  is  a  single-shot  probability  of  kill  for  a 
given  weapon-target  assignment. 

First  proposed  by  Marine  [63],  the  static  formulation  allocates  a  set  of  weapons 
to  targets  at  one  time  given  all  problem  information  is  known.  Many  variations  of 
the  SWTA  exist  within  the  literature  (see  [66]  and  [31]).  The  dynamic  case  provides 
an  allocation  policy  over  some  time  horizon,  for  which  more  information  may  arrive 
as  time  progresses. Many  formulations  of  the  DWTA  are  found,  but  for  each,  their 
underlying  structure  consists  of  the  sequential  allocation  of  weapons  to  targets  with 
some  sort  of  observed  outcome  occurring  between  stages.  First  formulated  by  Ho- 
scin  [48],  the  dynamic  problem  has  similar  probabilistic  characteristics  as  the  static 
problem,  but  the  complexity  is  increased  with  the  inclusion  of  a  solution  over  some 
time  horizon.  In  the  DWTA  problem,  weapons  allocations  impact  the  future  state 
space.  As  such,  the  DWTA  maintains  increased  complexity  for  which  few  solution 
techniques  exist. 

Though  it  has  not  been  researched  to  the  extent  of  the  SWTA  problem,  the 
DWTA  problem  provides  a  more  realistic  implementation  by  including  a  temporal 
component.  As  such,  the  DWTA  is  a  much  more  complex  problem  from  a  mathemat¬ 
ical  standpoint  and  has  received  a  fair  amount  of  attention  in  the  literature.  Similar 
to  the  SWTA,  a  number  of  methods  have  been  employed  to  provide  solutions  for 
various  types  of  DWTA  problems.  As  the  originator  of  the  dynamic  instance,  [47] 
provides  several  results  which  are  generalizable  to  the  DWTA  problem.  Murphey  [70] 
[71]  use  stochastic  decomposition  for  a  two-stage  DWTA  problem.  Specific  to  the 
general  DWTA  problem,  Chang  et  al.  [24]  use  a  static  WTA  approximation  scheme 
within  an  iterative  linear  network  flow  framework  to  effectively  provide  high-quality 
solutions  for  the  DWTA.  Because  of  the  integer  restriction  for  the  decision  variables, 
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the  chromosome  representation  within  a  GA  presents  a  useful  scheme  for  solving  both 
the  static  and  dynamic  versions  of  the  WTA  problem.  As  such,  much  work  has  de¬ 
veloped  hybrid  genetic  algorithms  (GAs)  to  assist  in  solving  the  DWTA.  Wn  et  al. 
[99]  apply  a  modified  GA  to  the  DWTA  and  introduce  weapon  use  deadlines  within 
the  problem  formulation.  These  deadlines  follow  the  principles  of  scheduling  theory, 
and  are  in  the  form  of  additional  constraints  such  that  a  weapon  must  be  shot  at  a 
target  by  a  specified  time  or  it  is  rendered  unusable.  The  authors  call  their  method 
a  modified  GA  because  it  applies  a  basic  GA  iteratively,  assigning  a  weapon  to  a 
target  (possibly  suboptimally)  immediately  before  the  deadline  is  reached.  [101]  de¬ 
velop  a  heuristic  which  uses  problem  information  (domain  knowledge)  and  constraint 
programming  to  assign  priorities  to  assignments.  Evolutionary  heuristics  which  use 
a  hybridized  GA  with  memetic  algorithms  have  also  been  applied  to  the  DWTA  by 
[25].  Additionally,  [54]  applies  a  hybrid  heuristic  which  uses  a  simulated  annealing 
(SA)  type  heuristic  to  determine  the  fitness  of  a  population  within  a  GA  framework. 
Other  heuristic  techniques  applied  to  the  DWTA  include  Tabu  Search  by  Xin  et  al. 
[102],  ant  colony  optimization  (AGO)  with  tabu  table  updates  by  [103],  and  a  mod¬ 
ified  Hungarian  method  with  particle  swarm  optimization  (PSO)  by  [56],  although 
this  is  provided  in  an  open  source  text,  so  it’s  rigor  may  be  unverified.  Lastly,  exact 
dynamic  programming  is  applied  to  specific  instances  of  the  DWTA  problem  [91]  [89] 

The  rest  of  this  chapter  is  structured  as  follows.  Section  5.3  defines  the  prob¬ 
lem  and  provides  the  modeling  framework.  Next,  Section  5.4  presents  the  proposed 
methodology  along  with  a  presentation  of  some  numeric  examples  and  computational 
results  in  Section  5.5.  Finally,  conclusions  and  future  research  are  presented  in  Section 
5.6. 
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5.3  Problem  Definition 


This  section  provides  the  description  of  the  problem,  to  include  assumptions  for 
the  DWTA  problem  given  in  Section  5.3.1,  followed  by  its  formulation  as  an  infinite 
horizon  discrete  time  Markov  decision  process  (MDP)  in  Section  5.3.2. 

5.3.1  Problem  Description. 

Consider  an  offensive  weapon  target  assignment  problem  consisting  of  a  known 
set  of  weapons  used  to  penetrate  an  integrated  air  defense  system,  of  which  all  targets 
are  assumed  known.  The  DWTA  divides  the  total  duration  of  the  attack  into  several 
discrete  intervals  in  which  information  is  obtained  about  the  outcomes  of  the  previous 
allocation.  Any  targets  destroyed  are  not  targeted  in  subsequent  stages,  allowing  the 
operators  to  make  better  use  of  their  weapons.  To  model  the  various  layers  of  an 
IADS,  the  problem  considers  single-shot  probabilities  which  depend  on  the  current 
target  set.  The  basic  assumptions  for  this  DWTA  formulation  are  as  follows: 

•  In  each  stage,  a  subset  of  the  remaining  weapons  is  selected  and  committed 
simultaneously 

•  The  problem  is  solved  at  each  stage  using  previous  stage  information 

•  Targets  are  present  for  the  entire  time  horizon  with  an  associated  value;  their 
value  goes  to  zero  when  destroyed 

•  The  outcome  of  each  stage  is  perfectly  observed  prior  to  the  next  stage 

•  Computing  the  optimal  assignment  for  the  current  stage  always  assumes  optimal 
assignments  will  be  made  in  subsequent  stages 

•  Weapons  are  allocated  at  each  stage  with  the  goal  of  optimizing  the  objective 
value  at  the  end  of  the  final  stage 
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•  Geospatial  characteristics  of  the  weapons  or  targets  are  implicitly  accounted  for 
in  their  effect  on  the  probability  space 

The  elements  of  this  multi-stage  problem  are  next  defined  and  the  dynamic  pro¬ 
gramming  formulation  provided. 

5.3.2  Problem  Formulation. 

This  problem  is  modeled  as  an  infinite  horizon,  discrete  time  Markov  decision 
process  (MDP)  using  the  collection  of  objects 

{T,S,A,p(-\S,a),C(S,a,W)}  (5.1) 

where  T  is  the  set  of  decision  epochs,  S  is  the  state  space,  As  represents  the  set  of 
allowable  actions  given  the  system  is  in  state  S,  with  A  =  Uses  “4s’  P{'\S,a )  is  the 
probability  transition  function  conditioned  on  being  in  state  S  and  making  decision 
a  G  Ms,  and  C(S,  a,  W)  is  the  reward  obtained  from  being  in  state  S,  making  decision 
a,  and  realizing  the  outcome  W.  Each  of  these  elements  are  described  in  greater  detail 
below. 

Let  T  =  {1,  2, . . .}  be  the  set  of  time  stages  and  let  t  G  T  denote  a  specific  stage. 
Let  St  =  ( Rt ,  Yt)  G  S  denote  the  state  of  the  system  at  time  t,  where  Rt  is  a  vector 
indicating  the  number  of  weapons  (of  M  different  types)  remaining  in  inventory  and 
Yt  is  a  vector  indicating  the  number  of  targets  (of  N  different  types)  still  functioning. 
Rt  =  (Rti,  Rt2,  ■  ■  ■  RtM),  where  Rtr  is  the  number  of  weapons  of  type  r  at  time  t, 

r  —  1, _ ,  M.  Yt  =  {Yn,Yt2, . . .  YtN),  where  Yty  is  the  number  of  targets  of  type  y  at 

time  t,  each  with  associated  value,  Vy,  y  —  1, . . . ,  N.  A  state  S  G  S  corresponds  to 
a  particular  pair  of  vectors  indicating  the  number  of  weapons  and  targets  remaining. 
Define  pry\Yt  as  the  single-shot  probability  of  kill  if  weapon  type  r  is  allocated  to  target 
type  y  given  the  current  target  set  Yt.  Define  qry\Yt  =  1  —  Pry\Yt  as  the  corresponding 
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probability  of  survival.  The  conditional  probabilities  of  survival  are  used  to  model 
the  cooperative  nature  of  an  IADS;  as  certain  targets  are  destroyed,  the  attacker 
achieves  improved  probability  of  destroying  other  targets.  For  brevity,  pry  =  pry\Yt 
and  qry  =  qry\Yt  is  henceforth  used. 

As  with  any  MDP,  at  each  time  step  the  state  determines  the  set  of  allowable 
controls.  Here  the  decision  is  a  function  of  the  remaining  weapons  and  the  current 
set  of  targets  in  the  threat  environment.  For  any  epoch,  Ast  represents  the  set  of 
allowable  decisions  given  the  system  is  in  state  S  at  time  t.  Define  the  decision 
variables  atryj  as  the  number  of  weapons  of  type  r  to  assign  to  target  j,  of  type  y,  at 
time  t.  A  matrix  of  decisions  and  the  constraint  set  can  be  defined  as 


and 


at(St) 


•Am 

0*112 

ChllYti 

0*121 

atl2Yt2 

UtlNYtN 


0*211 

0*212 

0*21Yii 

0*221 

0*22  Yt2 

0*2jVYtJV 


OiMll 

0*M  12 

0*M1Y« 

0*M21 

atM2Yi2 

UtMNYtN 


(5.2) 


T  N  Yty 


ASt  =  a(St)  |  EES  ^tryj  fs  -^1  r  ^  1,  2?  .  .  .  ,  &tryj  ^  ^ 


(5.3) 


t=  1  y= 1  j= 1 


Here  the  0  index  represents  the  allowable  control  of  udo  nothing ” .  At  each  time  step, 
given  a  state  St,  action  at,  and  outcome  Wt+\,  the  system  transitions  according  to 
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St+i  —  SM  (St,at,Wt+i) 


(5.4) 


where  SM (■)  is  a  function  describing  the  system’s  dynamics.  For  our  DWTA  problem, 
states  transition  in  two  distinct  fashions.  First,  let 

N  Yty 

ifltr)r=  1  —  ( ^  ^  ,  atryj )  (5-5) 

V=  1  j= 1 

be  a  vector  denoting  the  number  of  weapons  of  type  r  hred  at  time  t.  Then  our 
weapon  state  transitions  deterministically  following 

Rt+i  =  ( Rtr  -  atrtU  (5.6) 

The  target  vector  transitions  probabilistically  based  upon  the  allocation  policy  at 
each  decision  epoch.  Let  Yt+l  yj  be  a  random  variable  representing  the  outcome  of 
the  jth  target  of  type  y  given  a  decision  such  that 

{0  if  target  j  survives  the  attack, 

(5.7) 

1  if  target  j  is  destroyed  during  the  attack, 
for  each  target  type  y.  Further,  define 
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Yt+ 1,11 
Yf+ 1,12 


^t+l.lVti 

Yt+ 1,21 

^1+1,22 


nf  i  = 


^t+i/2yt2 


*Wi 

^t+l,JV2 


>t+i,jvyw 


then  the  target  state  element  transitions  following 


and 


Yty 

Yty  ~  y  ]  Yt+l,yj 

3= 1 


TV 


J  J/=l 


(5.8) 


(5.9) 


P{Yt+i,yj  —  0|<St,  a^} 


|i-n"i(9ri)^  in-,,,  1 

[l  Yi,y]  =  0 


(5.10) 


=  1|<SW 


n"l(?ri)^  if^  =  l 

0  if  Yt,yj  =  0 


(5.11) 


Here,  qrj  represents  the  single  shot  survival  probability  if  weapon  type  r  is  shot  at 
target  j.  This  must  be  done  for  all  active  targets  with  weapons  allocated  to  them  at 
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time  t.  If  nt  denotes  the  number  of  active  targets  with  weapons  allocated  to  them  at 
stage  t,  then  if  3^+i  is  the  set  of  possible  outcomes  known  by  time  t+  1,  |X+i|  =  2nt. 

As  previously  discussed,  each  target  has  an  associated  value,  Vr  Then  the  value 
obtained  at  any  time  step  follows 

N  Yty 

Ct+i(St,  at,Yt+i)  =  EE  VyYt+ hyj  (5.12) 

y=l  j=l 

The  value  of  any  target  destroyed  during  the  time  interval  (t,  t  + 1)  is  accumulated 
within  the  cumulative  objective  function  value.  The  objective  is  determine  a  policy 
7T  G  II  mapping  each  state  to  an  action  which  maximizes 

mail'  .  (5.13) 

where  II  is  the  set  of  all  possible  policies  and  7  is  the  discount  factor. 

5.4  Solution  Methodology 

The  SWTA  problem  has  been  shown  to  be  NP-complete  by  Lloyd  and  Witsen- 
hausen  [60],  therefore,  any  extension  is  also  NP-complete.  As  conditional  kill  prob¬ 
abilities  are  incorporated  ,  the  sequence  in  which  weapons  are  employed  becomes 
an  important  factor.  The  proposed  solution  to  this  problem  uses  approximate  dy¬ 
namic  programming  (ADP).  Section  5.4.1  introduces  dynamic  programming  and  lays 
the  foundation  for  solution  using  ADP.  Section  5.4.2  provides  a  description  of  the 
approximations  used. 
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5.4.1  Dynamic  Programming. 


5.4. 1.1  Value  Iteration. 

Dynamic  programming  is  a  well  demonstrated  method  for  solving  MDPs  such  as 
the  one  formulated  in  5.3.2.  At  each  time  step,  t,  the  value  of  being  in  each  state  is 
computed  using  Bellman’s  equations: 


Jt(St)  =  max  (Ct(St,  at)  +  qE  {Jt+1(St+i)\St}) 

ateAt 

=  max  (  Ct(St,at)  -  '■  ?(V|.SV,  a,  j.7,+i  (Vl 

V  Tts 


(5.14) 

(5.15) 


where  the  state  transitions  according  to  equation  6.7.  In  order  to  solve  the  problem 
the  Gauss-Seidel  variant  of  value  iteration  is  used.  This  algorithm  is 


Algorithm  5  Gauss-Seidel  Value  Iteration  Algorithm 

1:  Initialize:  Set  J°(s )  =  0  Vs  G  S.,  Fix  e  >  0,  Set  n  =  1. 
2:  For  each  s  G  S  compute: 


3: 


Jn(s) 


max 

aeA 


C(S,  a)  +  7 


s,  a)Jn(s')  +  p(s/|s,  a)Jn~l 

s'>s 


(5.16) 

If  ||  Jn  —  Jn^||  <  e(l  —  q)/27,  let  7re  be  the  resulting  policy  that  solves  5.22,  and 
let  Je  =  Jn  and  stop;  otherwise,  set  n  —  n  +  1  and  go  to  2. 


5. 4. 1.2  Approximate  Dynamic  Programming. 

Approximate  dynamic  programming  is  a  technique  often  used  for  solving  high 
dimensional  resource  allocation  problems.  Many  applications  exist  within  the  trans¬ 
portation  industry  [80]  [81]  [84]  [82],  Further,  ADP  has  been  applied  to  sensor  manage¬ 
ment  [21],  multiplatform  path  planning  [77],  and  stochastic  scheduling  [14],  Another 


area  which  has  a  significant  amount  of  literature  is  vehicle  routing  with  stochas¬ 
tic  demands  [73]  [86]  [87]  [3].  Other  resource  allocations  applications  include  activ¬ 
ity  networks  for  project  planning  [32]  [93],  model  predictive  control  [22],  and  high- 
dimensional  generalized  resource  allocation  [81],  among  others. 

The  difficulty  with  practically  sized  resource  allocation  problems  is  that  they  typi¬ 
cally  grow  exponentially  in  the  state,  action,  or  outcome  spaces;  the  presented  DWTA 
problem  is  no  different.  Specifically,  for  this  problem,  the  decision  space  grows  expo¬ 
nentially  as  a  function  of  the  state  space.  To  illustrate  this,  assume  an  arbitrary  state 
St  where  there  are  mt  weapons  remaining  and  nt  targets.  There  are  then  (nt  +  l)mt 
possible  actions  over  which  the  algorithm  must  iterate.  Much  of  the  focus  of  approx¬ 
imate  dynamic  programming  is  to  step  forward  making  use  of  an  estimate  for  the 
future  value  of  our  states.  Instead  of  looping  over  all  states  and  actions  in  exact  value 
iteration,  this  research  proposes  a  reduction  of  the  decision  space  using  the  principles 
of  ordinal  optimization. 

5.4.2  Value  Iteration  Using  a  Reduced  Decision  Space. 

Because  of  the  large  number  of  allocations  for  any  state  action  pair,  the  use  of 
order  statistics  is  proposed  to  reduce  the  size  of  the  action  space  investigated  during 
value  iteration.  First,  consider  the  method  used  by  Ho  and  Sreenivas  to  optimize 
discrete  event  dynamic  systems  [46]  known  as  ordinal  optimization.  The  purpose  of 
this  is  to,  for  each  state,  select  a  subset  of  decisions  to  investigate  such  that  the  best 
decision  from  the  selected  subset  is  better  than  a  pre-defined  population  percentile. 
Ordinal  optimization  has  been  used  in  a  wide  range  of  simulation  optimization  prob¬ 
lems  to  effectively  generate  high  quality  solutions.  Guan,  Ho,  and  Lai  [41]  use  ordinal 
optimization  to  select  a  set  of  approximated  bidding  strategies  for  electrical  power 
suppliers.  After  the  subset  of  options  is  selected,  an  exact  solution  is  solved,  and  the 
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best  bidding  strategy  is  selected.  Xie,  Zhong,  and  Wu  [100]  apply  a  similar  approach 
to  the  strengthening  of  transmission  networks  through  the  selection  of  several  alter¬ 
natives,  for  which  more  detailed  simulations  are  explored  prior  to  final  selection.  This 
research  uses  ordinal  optimization  to  select  a  subset  of  decisions  for  each  state,  such 
that  the  probability  of  selecting  a  good  enough  decision  can  be  fixed.  Let  Ast  C  Ast 
and  A,st  =  K.  Each  action  at  G  Ast  has  a  subsequent  expected  future  reward 
determined  using 


C,+1(S„a,)  =  ^P(s'|S„a,)J1+1(S') 


(5.17) 


s'£S 


Therefore,  considering  only  the  subset  Ast ,  order  the  samples  such  that  C'j^ ,  < 
Cf}  x  <  ...  <  C\+\  the  largest  order  from  the  sample  will  be  the  value  which  maximizes 


Jt(St )  =  max 

at&Ast 


c,(s,,a1)  +  7^P(s'|s„a,)j1+1(s') 


(5.18) 


s'&S 


Define  p  such  that  0  <  p  <  1  as  a  population  percentile  and  define  p  such  that 
0  <  p  <  1  as  a  desired  level  of  confidence.  Then,  distribution  free  confidence  intervals 
are  derived  for  these  percentiles  so  long  as  our  cumulative  density  function  <f>(a)  is 
strictly  increasing  because  <L(a)  =  p  has  a  unique  solution,  defined  as  Next,  select 
a  sample  size  K  which  guarantees 


p  {effi  >  &,}  >  P. 


(5.19) 


However, 


which  results  in 


p  {dS  >  A  = 1  -  p  (effl  <  = 1  -  ‘pK, 


(5.20) 


1  —  (pK  >  /9  =>  K  > 


log(l  -  p) 

log(^) 


(5.21) 


This  states  that,  if  K  samples  are  selected  from  any  population,  with  p  confidence, 
ip  percent  of  the  population  would  be  below  the  largest  order  statistic  C\+[.  This 
principle  is  used  to  reduce  the  number  of  actions  investigated  in  value  iteratioin,  and 
with  intelligent  alteration  of  the  distribution  from  which  our  samples  are  selected, 
results  in  high-quality  solutions  in  much  faster  computation  time.  The  algorithm  is 
described  in  Algorithm  6. 


Algorithm  6  Gauss-Seidel  value  iteration  with  a  reduced  decision  space 

1:  Initialize:  Set  i>°(s)  =  0  Vs  G  S.,  Fix  e  >  0,  Set  n  =  1. 

2:  For  each  S  E  S 
3:  if  |  As  |  >  K  then, 

4:  Generate  a  subset  of  decisions  As  C  As  where  |Ag|  =  K  according  to  <F(a). 

5:  else 

6:  As  =  As 

7:  end  if 

8:  Compute: 


9: 


[sj  =  max 
aeAs 


C(S,a)+1 


\s'<s 


E1 

s'>s 


,n—  1 


(5.22) 

If  ||un  —  un_1j|  <  e(l  —  7)/2y,  let  7re  be  the  resulting  policy  that  solves  5.22,  and 
let  ve  =  vn  and  stop;  otherwise,  set  n  =  n  +  1  and  go  to  2. 


5.5  Numeric  Results 

We  begin  by  defining  and  solving  a  simple  example  to  illustrate  the  computational 
complexity  of  the  problem.  Numeric  results  for  this  simple  example  are  presented, 
to  include  sensitivity  analysis  and  the  impact  of  parametric  changes.  Finally,  the 
problem  is  extended  to  that  of  a  more  practical  size  similar  results  are  discussed. 
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5.5.1  Simple  Example  Description. 


For  this  example  notional  future  weapons  concepts  are  investigted,  each  having 
different  capabilities.  The  initial  state  conditions  are  M  —  5,  N  —  3,  m  —  7,  and 
n  =  4.  Let 


and 


(5.23) 


H 

^  #SAM  target  type  ^ 

Y0  = 

V2 

= 

A  Radar  target  type 

w 

^  #C2  target  type  y 

v1/ 


(5.24) 


Then,  the  initial  state  vector  is  S0  =  ( R0 ,  Y0 )  meaning  there  are  two  weapons  of 
type  1,  3  and  5,  one  weapon  of  type  4,  and  0  weapons  of  type  2.  Target  type  y  is 
valued  according  to  V  =  (Vf,  V2,  V^)T  =  (100,  200,  300)T.  Target  state  transitions  are 
based  on  Table  8. 


Table  8.  Conditional  probabilities  of  the  state  transitions 


Single  Shot  p 

_kill 

No  SAMs 

No  Radars 

No  SAM 

(all  target  types 

remain) 

Remaining 

Remaining 

or  Radar 

Weapon  Type  | 

|  SAM 

|  Radar 

I  C2 

|  Radar  | 

1  c2 

|  SAM 

1  c2 

1  c2 

1 

0.8 

0.6 

0.5 

0.65 

0.6 

0.95 

0.55 

0.6 

2 

0.6 

0.8 

0.5 

0.9 

0.55 

0.65 

0.55 

0.6 

3 

0.6 

0.5 

0.8 

0.6 

0.95 

0.65 

0.85 

0.95 

4 

0.45 

0.6 

0.375 

0.675 

0.4125 

0.4875 

0.4125 

0.45 

5 

0.45 

0.375 

0.6 

0.45 

0.7125 

0.4875 

0.6375 

0.7125 

Notice  that  the  single  shot  probabilities  of  kill  are  conditional  probabilities  that 


change  based  upon  the  targets  currently  in  the  threat  environment.  This  is  used 
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to  model  a  scenario  in  which  elimination  of  a  certain  target  type  may  degrade  the 
adversarial  capability,  thus  increase  the  probability  of  destroying  a  terminal  target. 
Here  the  terminal  target  is  a  command  and  control  (C2)  target,  and  the  terminal 
states  are  when  all  weapons  have  been  used,  or  the  C 2  target  has  been  destroyed. 
Though  it  is  assumed  the  platform  from  which  weapons  are  fired  is  out  of  threat 
range,  risk  is  implicitly  added  using  a  discounting  factor,  7. 


5.5.2  Simple  Example  Solution. 

A  walkthrough  for  the  solution  of  our  simple  example  is  presented  along  with  a 
brief  discussion  of  the  implications  of  the  problem  formulation.  Using  value  iteration, 
all  possible  states  and  decisions  must  be  considered.  For  this  simple  example  with 
seven  weapons  and  four  targets,  there  are  864  states,  of  which  556  may  be  transitioned 
to  feasibly.  The  computational  issue  faced  is  the  number  of  possible  decisions  when 
there  are  many  weapons  and  targets  remaining.  For  the  initial  state  (n  +  l)m  = 
(4  +  1)'  =  78,125  possible  decisions  must  be  investigated.  The  optimal  action  at 
t  =  0  is 


a*(S0) 


1  0  0  0  0 
1  0  0  0  0 
0  0  0  1  0 
0  0  0  0  0 


In  this  representation,  columns  reference  weapon  type,  r  and  rows  reference  a 
specific  target  j,  j  =  1,2,...,  n.  The  optimal  action  is  to  fire  one  of  the  first  weapon 
type  (r  =  1)  at  each  of  the  active  SAMs  (y  —  1),  and  the  only  weapon  of  type  r  —  4 
at  the  Radar.  Note  that  because  of  the  homogeneity  of  the  SAM  targets  it  would  be 
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optimal  to  alternate  which  weapons  to  fire  at  them  while  remaining  optimal.  Using 
the  data  in  Table  8  and  the  target  values,  the  expected  single-stage  contribution  is 


0, (So,  A*0)  =  0.8  *  100  +  0.45  *  100  +  (0.6)  *  200  =  280  (5.25) 

Recall  the  weapon  state  transitions  deterministically,  so 


(2) 

(2) 

fo\ 

0 

0 

0 

R\  (R()r  ^0r)rG7^ 

2 

— 

0 

= 

2 

1 

1 

0 

w 

VV 

W 

The  target  state,  however,  would  transition  to  one  of  six  possible  target  states: 


V  6 


/ 

T— 1 

f°l 

f2l 

T— 1 

f°) 

1 

1 

1 

1 

0 

1 

0 

0 

< 

w 

w 

w 

w 

w 

w 

> 

(5.27) 


Note  that  Y\  =  (1, 1, 1)T  and  Y\  =  (1,  0, 1)T  could  each  be  reached  by  two  different 
paths,  so  caution  must  be  taken  when  computing  their  probabilities.  For  the  next 
step,  assume  Y\  =  (1,0, 1)T,  meaning  one  of  the  SAMs  and  the  RADAR  were  each 
destroyed  and  kill  probabilities  transition.  The  optimal  policy  for  S\  is 


a*(Si) 


^  0  0  0  0  2^ 

0  0  0  0  0 

0  0  0  0  0 

v  0  0  0  0  Oy 


(5.28) 
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meaning  both  weapons  of  type  one  are  allocated  to  the  remaining  SAM  target.  As 
a  means  of  validation,  it  is  expected  that  V*(Si)  and  A* (Si)  would  be  the  same 
regardless  of  which  SAM  is  remaining  in  the  threat  environment.  This  is  confirmed 
in  the  results.  The  state  will  now  transition  to  one  of  two  states: 


fo\ 

0 

(°) 

0 

t-H 

2 

5 

0 

•) 

2 

5 

0 

< 

0 

vv 

t-H 

0 

VV 

t-H 

> 

(5.29) 


where  the  obvious  optimal  decision  will  be  to  fire  the  remaining  weapons  at  the  C 2 
node,  at  which  point  the  system  is  guaranteed  to  transition  to  a  terminal  state.  An 
item  of  interest  comes  in  looking  at  the  single  shot  kill  probabilities  for  the  differ¬ 
ent  scenarios.  When  no  radars  are  present,  notice  =  0.95, p5;1  =  0.4875, pli4  = 
0.55,  £>5,4  =  0.6375,  however,  when  there’s  only  the  C 2  target  remaining  the  probabil¬ 
ities  shift  to  pi;4  =  0.6  and  p5i4  =  0.7125.  It  is  much  more  advantageous  to  shoot  the 
remaining  weapon  of  type  one  at  the  SAM  because  of  both  the  value  gained,  and  the 
likelihood  that  the  system  will  transition  to  a  state  where  only  the  C 2  remains. 


5.5.3  Numeric  results  for  the  simple  example. 

Given  an  exact  solution  for  our  model,  numerical  comparisons  are  presented  for 
the  approximation  techniques.  Using  the  property  from  Section  5.4.2  that  <f>(a)  must 
be  strictly  increasing  several  discrete  distributions  are  selected  that  determine  how  a 
subset  of  actions  are  selected  for  each  state.  Let  k(at)  be  the  number  of  weapons  to 
fire,  these  distributions  are  used  to  determine  the  number  allocations  are  generated 
for  each  k(at ),  which  will  be  some  proportion  of  K . 
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5.5.3. 1  Uniform  Discrete  Distribution. 


As  an  initial  choice,  a  uniform  discrete  distribution  is  used,  with  CDF 

$(k(at)-,mmin,ms)  =  - mmtn  +  1  (5.30) 

mst  -  mmtn  +  1 

where  mmin  is  the  minimum  number  of  weapons  to  fire  (for  our  example  rrimin  =  1), 
ms  is  the  number  of  weapons  given  the  state  s  6  S,  and  k(at )  G  [ mmin,ms ].  This 
method  provides  a  broad  exploration  of  the  allocation  space. 


5. 5. 3. 2  Binomial  Distribution. 

Next  problem  knowledge  is  used  to  generate  discrete  distributions  which  increases 
the  likelihood  of  selecting  good  actions.  For  some  initial  state,  unless  the  discount 
factor  is  low  enough,  it  will  be  suboptimal  to  fire  all  remaining  weapons  at  once. 
Similarly,  it  is  likely  that  firing  one  weapon  during  a  stage  where  numerous  weapons 
remain  in  inventory  may  not  be  optimal.  Additionally,  because  of  the  combinatoric 
nature  of  the  problem,  there  are  a  greater  number  of  possible  ways  of  allocating 
weapons  if  k{t )  does  not  lie  on  the  bounds  of  [rnmm,  ms] .  Therefore  a  binomial 
distribution  is  used  to  generate  allocations  centered  around  the  median  number  of 
weapons  in  each  state. 


IfcOOJ 

$(k(at),ms,p)  =  Y 

i= 0 

By  fixing  the  sample  size  at  the  number  of  weapons  in  the  state,  the  success 
probability  parameter  is  altered  to  vary  the  shape  of  the  distribution.  The  frequencies 
from  our  distribution  are  then  used  multiplied  by  K  following 


m. 


pl(l-p)r 


(5.31) 


k(at)  =  \(f>(k(at)))K] 


(5.32) 
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Probability  of  success  =  0.7  Probability  of  success  =  0.5  Probability  of  success  =  0.3 


Figure  8.  Binomial  Selection  Distributions  for  p  =  p  =  0.95  =>  K  >  59 


where  (ft  represents  the  binomial  probability  mass  function  and  k(at)  is  the  actual 
number  of  allocations  to  generate  for  k(at).  The  ceiling  function  guarantees  that  the 
actual  number  of  samples  will  be  greater  than  or  equal  to  K.  Three  examples  for 
K  =  59  are  presented  in  Figure  8. 


5. 5. 3. 3  Comparative  Results. 

A  comparison  of  results  for  the  various  approximation  schemes  is  now  presented  for 
the  small  example.  The  probability  paramater  (ft  is  varied  for  the  binomial  distribution 
to  see  its  impact  on  solution  quality.  Similarly,  tp  and  p  are  varied.  The  results  are 
shown  in  Tables  9-12. 


Table  9.  Results  for  p  =  p  =  0.95  =>■  K  >  59 


Exact 

UNIF(1,  ms ) 

B{ms,  0.7) 

B(ms ,  0.6) 

B(ms ,  0.5) 

B(ms,  0.4) 

B(ms ,  0.3) 

J* 

636.582 

613.825 

624.606 

629.017 

630.153 

628.434 

627.406 

A  J* 

- 

22.757 

11.976 

7.565 

6.429 

8.147 

9.176 

±9.844 

±8.834 

±5.238 

±3.598 

±4.205 

±4.375 

%A  J* 

- 

3.6% 

1.3% 

1.3% 

1.3% 

1.3% 

1.4% 

±1.5% 

±1.4% 

±0.8% 

±0.6% 

±0.7% 

±0.7% 

Average  Worst  A  J(S) 

- 

51.4309 

34.269 

25.7421 

27.0920 

24.6047 

25.6585 

of  all  states 

±9.2023 

±6.3033 

±6.1891 

±5.775 

±5.4376 

±4.459 

Average  Worst  A  J  % 

- 

9.7% 

7.1% 

5.6% 

6.3% 

5.8% 

6.5% 

of  all  states 

±2% 

±1.9% 

±1.7% 

±1.8% 

±1.6% 

±1.6% 

Average  A  J  \/S 

- 

1.7367 

1.1106 

0.9213 

0.9879 

1.1178 

1.3604 

±0.1625 

±0.1441 

±0.0887 

±0.1130 

±0.1166 

±0.0998 

Average  A  J  VS  % 

- 

0.3% 

0.2% 

0.1% 

0.2% 

0.2% 

0.2% 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

Comp  Time  (sec) 

5.952 

0.3109 

0.3122 

0.3097 

0.3100 

0.3107 

0.3035 

±0.0227 

±0.0015 

±0.0019 

±0.0022 

±0.0011 

±0.0015 

±0.0017 
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Table  10.  Results  for  ip  =  .95 p  =  0.99  =>  K  >  90 


Exact 

UNIF(  1,  ms ) 

B(ms ,  0.7) 

B (ms ,  0.6) 

B(ms ,  0.5) 

B{ms ,  0.4) 

B(ms ,  0.3) 

J* 

636.582 

618.235 

627.842 

631.346 

630.912 

631.583 

630.474 

A  J* 

18.347 

8.740 

5.236 

5.670 

4.999 

6.108 

±8.944 

±5.351 

±4.324 

±3.774 

±3.377 

±3.438 

%A  J* 

- 

2.9  % 

0.8  % 

0.8  % 

0.8  % 

0.8  % 

1.0  % 

±1.4% 

±0.8% 

±0.7% 

±0.6% 

±0.5% 

±0.5% 

Average  Worst  A  J 

- 

42.1246 

26.6107 

20.8569 

20.2236 

23.1063 

22.0201 

±7.1965 

±4.584 

±4.6305 

±4.9219 

±5.6322 

±5.9783 

Average  Worst  A  J  % 

- 

8.2% 

5.2% 

4.6% 

4.7% 

5.7% 

5.6% 

±1.5% 

±1.3% 

±1.5% 

±1.6% 

±1.8% 

±1.8% 

Average  A  JWS 

- 

1.1065 

0.6556 

0.5364 

0.5382 

0.5958 

0.7739 

±0.1230 

±0.0849 

±0.0826 

±0.0630 

±0.0684 

±0.0702 

Average  A  JVS  % 

- 

0.17% 

0.1% 

0.08% 

0.08% 

0.09% 

0.12% 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

Comp  Time  (sec) 

5.952 

0.4084 

0.4132 

0.4121 

0.4128 

0.4105 

0.4123 

±0.0227 

±0.0021 

±0.0021 

±0.0026 

±0.0024 

±0.0014 

±0.0015 

Table  11.  Results  for  ip  =  0.99p  =  0.95  =>  I\  >  299 


Exact 

UNIF(1,  ms ) 

B(ms,  0.7) 

B(ms ,  0.6) 

B{ms,  0.5) 

B(ms,  0.4) 

B(ms ,  0.3) 

J * 

636.582 

628.943 

633.115 

634.9849 

635.0977 

634.2314 

634.3026 

A  J* 

- 

7.639 

3.467 

1.597 

1.484 

2.351 

2.279 

±4.611 

±3.482 

±1.812 

±1.572 

±2.55 

±2.35 

%AJ* 

- 

1.2% 

0.37% 

0.37% 

0.37% 

0.37% 

0.36% 

±0.07% 

±0.05% 

±0.03% 

±0.02% 

±0.04% 

±0.037% 

Average  Worst  A  J 

- 

19.0955 

11.9 

8.9806 

8.7924 

7.7779 

10.4195 

±3.5135 

±2.8815 

±3.1484 

±2.6031 

±1.513 

±3.5743 

Average  Worst  A  J  % 

- 

3.49% 

2.24% 

1.7% 

1.7% 

1.45% 

2.1% 

±0.687% 

±0.61% 

±0.67% 

±0.62% 

±0.413% 

±0.96% 

Average  A  JWS 

- 

0.3013 

0.1574 

0.1087 

0.1115 

0.1158 

0.1510 

±0.0637 

±0.0337 

±0.0268 

±0.0176 

±0.0197 

±0.0205 

Average  A  JVS  % 

- 

0.047% 

0.025% 

0.017% 

0.018% 

0.024% 

- 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

Comp  Time  (sec) 

5.952 

0.8788 

0.8800 

0.8792 

0.8814 

0.8778 

0.8782 

±0.0227 

±0.0036 

±0.0033 

±0.0047 

±0.0030 

±0.0031 

±0.0033 
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Table  12.  Results  for  ip  =  p  =  0.99  =>  K  >  459 


Exact 

UNIF(  1,  ms ) 

B(ms ,  0.7) 

B(ms ,  0.6) 

B(ms,  0.5) 

B{ms ,  0.4) 

B(ms ,  0.3) 

J* 

636.582 

631.74 

634.8 

635.37 

635.58 

634.59 

635.26 

A  J* 

- 

4.842 

1.782 

1.214 

1 

1.99 

1.324 

±4.121 

±2.34. 

±1.74 

±1.474 

±1.813 

±1.41 

%AJ* 

- 

0.76% 

0.3% 

0.3% 

0.3% 

0.3% 

0.3% 

±0.65% 

±0.37% 

±0.27% 

±0.23% 

±0.285% 

±0.22% 

Average  Worst  A  J 

- 

16.2025 

9.7996 

6.9378 

6.9352 

7.9083 

8.9068 

±3.5294 

±3.7777 

±1.1341 

±0.8392 

±2.9455 

±3.0443 

Average  Worst  A  J  % 

2.98% 

1.83% 

1.31% 

1.34% 

1.55% 

1.76% 

- 

±3.53 

±3.78 

±1.13 

±0.84 

±2.95 

±3.04 

Average  A  JWS 

- 

0.2 

0.1009 

0.0703 

0.0675 

0.0863 

0.1138 

±0.0349 

±0.0283 

±0.0171 

±0.0130 

±0.0150 

±0.0168 

Average  AJVS  % 

- 

0.005% 

0.004% 

0.003% 

0.002% 

0.002% 

0.003% 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

±  <  0.0001 

Comp  Time  (sec) 

5.952 

1.0924 

1.0933 

1.0896 

1.0920 

1.0892 

1.0911 

±0.0227 

±0.0049 

±0.0049 

±0.0057 

±0.0035 

±0.0042 

±0.0049 

The  proposed  methods  demonstrate  a  distinct  reduction  in  computation  time, 
compared  to  the  exact  method,  for  each  level  of  K,  with  a  comparatively  small  opti¬ 
mality  gap.  All  experiments  were  performed  using  MATLAB  2013a  on  an  Intel  XEON 
X5667  with  24GB  RAM.  Note  that  the  binomial  distribution  performs  better  than 
the  uniform  distribution  in  all  cases.  This  is  not  surprising  because  the  shape  of  the 
binomial  distribution  should  reinforce  the  selection  of  decisions  which  are  more  likely 
to  increase  the  long-term  objective  value.  Additionally,  within  the  family  of  binomial 
distributions,  R(10,0.6)  or  5(10,0.5)  consistently  provide  better  solutions  in  these 
runs.  Intuitively,  solution  quality  increases  as  K  increases,  though  not  necessarily  in 
a  linear  manner.  Figure  9  shows  that  we  get  greater  improvement  in  average  A  J  (for 
all  states)  going  from  K  =  59  to  K  =  90  in  relation  to  computation  time.  Further, 
computation  time  appears  fairly  linear  with  respect  to  K 


5.5.4  Sensitivity  Analysis. 

Next,  sensitivity  analysis  is  performed  on  various  parameters  within  our  problem. 
First,  the  impact  with  which  the  discount  factor  7  has  on  the  optimal  policy  and 
value  function  is  investigated.  The  optimal  myopic  policy  for  the  small  example  is 
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Figure  9.  J*  —  J*  by  computation  time 
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with  J^yopic  =  616.75.  The  values  of  7  are  varied  and  show  the  dynamics  of  the 
system  become  significant  around  7  =  0.9  where  myopic  is  no  longer  optimal.  Recall 
that  1  —  7  represents  the  probability  that  the  platforms  are  “shot  down” ,  meaning 
if  the  likelihood  of  being  destroyed  is  >  10%,  a  static  policy  will  be  optimal  for  the 
problem  defined. 

Next  a  methodology  for  generating  problem  instances  is  presented  for  further 
analysis.  Two  additional  weapon  types  are  added  to  the  problem,  along  with  an 
additional  target  type,  representing  a  second  type  of  SAM.  Because  of  the  sensitivity 
of  actual  data,  a  means  for  computing  kill  probabilities  with  practical  significance 
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Table  13.  List  of  events  for  defining  probability  constraints 


Event 

Active  Target  Types 

E 

SAM  1,  SAM  2,  Radar,  C2 

F 

Radar,  C 2 

G 

SAM  1  and/or  SAM  2,  C2 

H 

C2 

was  needed.  First,  bounds  are  set  for  the  probability  of  kill  for  each  weapon  and 
target  type.  Define  plry  and  as  the  lower  and  upper  bounds,  respectively.  Next, 
constraints  are  imposed  on  future  kill  probabilities  as  follows.  Targets  are  labeled 
a,  b ,  c,  and  d  for  SAM  1,  SAM  2,  Radar,  and  C2  respectively.  Additionally,  define 
the  following  events;  let  E  denote  the  event  that  all  target  types  remain,  F  denote 
the  event  that  all  SAM  targets  have  been  destroyed,  G  denote  the  event  where  all 
Radar  targets  have  been  destroyed,  and  H  denote  the  event  where  all  SAM  and  Radar 
targets  have  been  destroyed.  These  events  are  shown  in  Table  13. 

The  conditional  probability  constraints  are  then 


Prc\F  >  Prc\E,  (5.33) 

Prd\F  >  Prd\E,  (5.34) 

Pra\G  >  Pra\E,  (5.35) 

Pra\G  >  Pra\E,  (5.36) 

Prd\G  >  Prd\F,  (5.37) 

Prd\H  >  Prd\F,  (5.38) 

Prd\H  >  Prd\G •  (5.39) 


A  nearly  orthogonal  latin  hypercube  (NOLH)  design  for  up  to  seven  factors  was 
used  to  generate  pry\E  for  all  r  and  y.  Here  the  factors  are  the  target  types,  with 
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the  weapons  capabilities  denoting  the  design  space  to  be  investigated.  This  resulted 


in  17  potential  weapons  choices  from  which  M  are  selected  randomly  according  to  a 


uniform  distribution.  Next  for  each  weapon  type  r,  and  active  target  type  y  (based 


upon  the  specific  event),  and  event  F,G,  and  H ,  compute  the  following 


Pry\F  rand  *  (pry  Pry\E^)  T  Pry\Ei 
Pry\G  rand  *  ( Pry  Pry\F )  T  Pry\Ei 
Pry\H  =  rand  *  (p™y  -  pry]G )  +  pry\G. 


(5.40) 

(5.41) 

(5.42) 


This  provides  exploration  of  well  spaced  alternatives  for  a,  b,  c,  and  d  and  probabilities 
that  satisfy  (5.33)  -  (5.39).  It  is  assumed  erroneous  to  say  weapons  capabilities  would 
degrade  as  targets  are  eliminated  from  the  threat  environment.  Therefore,  as  threats 
are  diminished,  weapons’  capabilities  increase  in  turn.  Note  that  kill  probabilities 
are  modeled  to  implicitly  factor  in  both  effectiveness  of  weapons  and  the  risk  that 
a  weapon  is  shot  down  during  employment.  An  example  matrix  which  is  used  for 
additional  analysis  is  provided  in  Table  21. 


Table  14.  Updated  conditional  transition  probabilities 


Single  Shot  pry 
(all  target  types  remain) 

No  SAMs 
Remaining 

No  Radars 
Remaining 

No  SAM 
or  Radar 

Weapon  Type 

|  SAM  1 

|  SAM  2 

|  Radar  | 

1  °2 

|  Radar 

1  c2 

|  SAM  1 

|  SAM  2 

1  c2 

1  c 2 

1 

0.47 

0.51 

0.6 

0.59 

0.67 

0.69 

0.75 

0.65 

0.76 

0.76 

2 

0.53 

0.68 

0.58 

0.54 

0.65 

0.83 

0.87 

0.7 

0.84 

0.92 

3 

0.48 

0.56 

0.47 

0.51 

0.58 

0.89 

0.86 

0.94 

0.86 

0.91 

4 

0.55 

0.58 

0.48 

0.56 

0.7 

0.93 

0.68 

0.64 

0.88 

0.94 

5 

0.47 

0.65 

0.45 

0.62 

0.83 

0.78 

0.54 

0.71 

0.82 

0.83 

6 

0.56 

0.48 

0.51 

0.47 

0.74 

0.77 

0.55 

0.78 

0.84 

0.9 

7 

0.64 

0.51 

0.56 

0.48 

0.65 

0.95 

0.7 

0.68 

0.94 

0.95 

The  analysis  was  re-run  using  the  probabilities  in  Table  21,  the  results  are  pre¬ 
sented  in  Tables  18-17.  This  example  investigates  one  target  of  each  type.  Based 
on  the  results  of  the  initial  experiments,  this  analysis  is  only  performed  for  for 
B(ms,  0.7),  B(ms,  0.5),  and  B(ms,  0.3).  This  provides  a  spread  of  binomial  distri- 
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butions  for  comparison,  but  excludes  approximation  using  the  uniform  distribution 
due  to  its  relative  poor  performance.  Additionally,  computation  time  for  the  next  set 
of  experiments  was  almost  identical  to  the  initial  experiments,  and  is  omitted  from 
the  results. 

Table  15.  Results  for  ip  =  p  =  0.95  =>  K  >  59  using  updated  kill  probabilities 


Exact 

B(ms,  0.7) 

B(ms,  0.5) 

B(ms ,  0.3) 

J* 

627.2211 

599.0734 

612.7626 

620.2505 

A  J* 

- 

28.1477 

±10.2858 

14.4585 

±7.5289 

6.9705 

±2.999 

%A  J* 

- 

4.488% 

±1.64% 

2.3% 

±1.2% 

1.1% 

±0.48% 

Average  Worst  A  J(S) 

- 

73.5004 

44.7731 

24.1433 

of  all  states 

±7.799 

±8.4349 

±8.3095 

Average  Worst  A  J  % 

- 

11.72% 

7.14% 

3.85% 

of  all  states 

±1.24% 

±1.34% 

±1.32% 

Average  A  J  VS 

- 

3.5588 

±0.2746 

1.4147 

±0.1277 

0.596 

±0.0981 

Average  A  J  V5  % 

- 

0.6799% 

±0.00044 

0.6765% 

±0.0002 

0.6752% 

±0.00016 

Table  16.  Results  for  ip  =  .95 p  =  0.99  =>  K  >  90 


Exact 

B(ms ,  0.7) 

B(ms,  0.5) 

B(ms ,  0.3) 

J* 

627.2211 

603.2401 

617.1384 

621.5328 

A  J* 

- 

23.981 

10.0827 

5.6883 

±9.8287 

±5.423 

±3.1795 

%A  J* 

- 

3.82% 

1.61% 

0.91% 

±1.57% 

±0.86% 

±0.51% 

Average  Worst  A  J(S) 

- 

65.4856 

35.7083 

15.9688 

of  all  states 

±10.7032 

±6.6988 

±6.1307 

Average  Worst  A  J  % 

- 

10.44% 

5.69% 

2.55% 

of  all  states 

±1.71% 

±1.07% 

±0.98% 

Average  A  J  VS 

- 

2.4767 

0.8271 

0.2854 

±0.1545 

±0.1239 

±0.0456 

Average  A  J  VS  % 

- 

0.6782% 

0.6756% 

0.6743% 

±0.00025 

±0.0002 

±0.00007 
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Table  17.  Results  for  ip  =  0.99p  =  0.95  =4-  K  >  299 


Exact 

B(ms ,  0.7) 

B(ms,  0.5) 

B(ms ,  0.3) 

J* 

627.2211 

614.8137 

622.1174 

624.61 

A  J* 

- 

12.4074 

±5.9636 

5.1037 

±3.6121 

2.6111 

±1.4042 

%AJ* 

- 

1.98% 

±0.95% 

0.81% 

±0.58% 

0.42% 

±0.22% 

Average  Worst  A  J(S) 

- 

38.7502 

16.2579 

4.6529 

of  all  states 

±5.1409 

±7.4426 

±1.5964 

Average  Worst  A  J  % 

- 

6.18% 

2.59% 

0.74% 

of  all  states 

±0.82% 

±1.19% 

±0.25% 

Average  A  J  VS 

- 

0.8467 

±0.1146 

0.1637 

±0.0511 

0.0409 

±0.0113 

Average  A  J  V5  % 

- 

0.676% 

±0.0002 

0.675% 

±0.0001 

0.674% 

±0.00002 

Table  18.  Results  for  <p  =  p  =  0.99  =>  K  >  459  using  updated  kill  probabilities 


Exact 

B(ms ,  0.7) 

B(ms ,  0.5) 

B(ms ,  0.3) 

J* 

627.2211 

617.0361 

624.6009 

625.2398 

A  J* 

- 

10.185 

2.602 

1..9813 

±5.0288 

±2.5558 

±1.0368 

%A  J* 

- 

1.62% 

0.42% 

0.32% 

±0.80% 

±0.41% 

±0.17% 

Average  Worst  A  J(S) 

- 

33.6234 

10.1703 

3.6597 

of  all  states 

±5.5104 

±6.0695 

±0.9198 

Average  Worst  A  J  % 

- 

5.36% 

1.62% 

0.58% 

of  all  states 

±0.88% 

±0.97% 

±0.15% 

Average  A  J  VS 

- 

0.6189 

0.0808 

0.026 

±0.0809 

±0.0276 

±0.0065 

Average  A  J  VS  % 

- 

0.6752% 

0.6744% 

0.6743% 

±0.00013 

±0.00004 

±0.00001 

The  results  are  fairly  consistent  with  the  initial  experiments,  with  a  few  excep¬ 
tions.  The  greatest  improvement  seen  with  the  second  set  of  experiments  comes  with 
a  binomial  parameter  of  <j)  =  0.3,  where  as  the  most  improvement  was  previously  ob¬ 
tained  using  the  binomial  success  parameter  of  (j)  =  0.4,  0.5,  or  0.6.  One  explanation 
for  this  change  is  that  the  method  reinforces  reservation  of  weapons  for  future  stages 
given  the  updated  probability  tables.  With  the  arbitrarily  generated  kill  probabili¬ 
ties,  the  sequential  destruction  of  targets  was  not  as  necessary  because  for  each  stage, 
weapons  which  had  great  effect  on  different  target  types  were  likely  present.  This 
early-stage  effectiveness  may  cause  the  method  to  fire  more  weapons  earlier. 
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5.5.5  Numeric  results  for  larger  problems. 


Using  the  information  gleaned  from  Section  5. 5. 3. 3  the  effectiveness  of  the  pro¬ 
posed  method  is  performed  on  larger  problem  instance.  Because  of  the  improvement 
in  solution  quality  with  a  comparatively  small  increase  in  computation  time,  K  is 
fixed  at  59.  Additionally,  because  the  greatest  improvement  in  solution  quality  was 
obtained  using  various  binomial  distributions,  they  are  investigated  further  in  large 
scale  problems.  Since  the  size  of  the  decision  space  for  these  problems  is  so  large  (a 
problem  with  10  weapons  and  10  targets  has  |Ms0|  ~  26 B),  comparison  with  the  exact 
optimal  is  computationally  prohibitive.  Instead,  a  myopic  approach  is  developed.  In 
the  myopic  approach,  the  decision  space  becomes  the  set  of  all  possible  weapons  able 
to  be  allocated  for  a  given  state.  Essentially,  this  means  that  for  any  given  initial 
state,  M  =  5.  For  each  decision,  a  static  weapon-target  assignment  problem  is  solved 
through  simple  recursion  to  determine  the  optimal  allocation  for  the  state-action  pair 
while  using  the  dynamic  kill  probabilities.  Because  the  decision  space  is  much  small 
in  this  case,  exact  value  iteration  can  be  used.  However,  the  decisions  are  now  myopic 
due  to  their  single-stage  solution.  The  number  of  states  over  which  must  be  iterated 
is  the  primary  metric  in  determining  adequate  problem  size.  An  example  with  20 
weapons  and  20  targets  has  over  seven  million  states,  at  which  point  storage  and 
computation  becomes  an  issue.  Therefore,  for  the  demonstrated  analysis,  the  prob¬ 
lem  size  is  limited  to  ten  or  12  weapons  and  seven  or  ten  targets.  This  limitation  also 
provides  some  practicality  in  a  geographic  sense,  as  threats  which  are  farther  away 
will  likely  not  be  considered  in  an  optimal  policy  given  the  problem  assumptions.  The 
cases  with  seven  targets  have  two  each  of  SAMI,  SAM2,  and  radar,  with  a  single  C 2 
target.  The  problems  with  ten  targets  have  four  SAMI  targets,  three  SAM2  targets, 
two  radars,  and  one  C2.  Weapons  were  arbitrarily  selected  such  that  each  weapon 
type  had  at  least  one,  and  the  remainder  were  spread  evenly  across  weapon  types. 
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For  the  ten  weapon  problems,  Rq  =  (2121121)T  and  for  the  tweleve  weapon  problems, 
Rq  =  (2221122)%  For  the  ADP  method,  the  success  parameter  is  set  to  <p  —  0.4,  0.5, 
and  0.6  and  ten  replications  of  each  are  run.  The  results  are  reported  in  Table  19. 


Table  19.  Results  of  large  scale  experiments 


Weapons 

Targets 

Distribution 

#  States 

7* 

myopic 

7* 

dynamic 

%Improvement 

10 

7 

B(ms,0A) 

991.1161  ±  13.9186 

13.5  ±  1.59% 

10 

7 

B{ms,  0.5) 

18,816 

873.24 

967.1818  ±  19.2715 

10.76  ±  2.21% 

10 

7 

B(ms,  0.6) 

969.4806  ±  12.5804 

11.02  ±  1.44% 

12 

7 

B(ms,0A) 

1062.9  ±  6.2844 

10.33  ±  0.65% 

12 

7 

B{ms,  0.5) 

46, 570 

963.39 

1049.3  ±  8.0023 

8.92  ±  0.83% 

12 

7 

0.6) 

1046.4  ±  8.8097 

8.61  ±0.91% 

10 

10 

B(ms,0A) 

1026.7  ±  14.0966 

9.01  ±  1.5% 

10 

10 

B{ms,  0.5) 

29,676 

941.878 

985.1773  ±  17.9595 

4.6  ±  1.9% 

10 

10 

B(ms,  0.6) 

975.6688  ±  24.7253 

3.5  ±  2.63% 

Table  20.  Computation  time  (in  seconds)  of  large  scale  experiments 


Weapons 

Targets 

Myopic 

ADP 

10 

7 

653.8  ±2.27 

15.4436  ±  0.0758 

12 

7 

1,586.3  ±7.34 

38.7864  ±  .1587 

10 

10 

1,034.2  ±4.38 

25.2761  ±  0.1794 

As  is  expected,  there  is  a  significant  improvement  gained  in  this  analysis  with 
the  proposed  method.  By  considering  the  impact  current  allocations  have  on  the 
future,  the  ADP  method  shows  an  approximate  improvement  of  10%  over  the  myopic 
solution.  The  large  scale  problems  also  suggest  further  evidence  that,  given  the  kill 
probabilities  from  Table  21,  it  is  beneficial  to  reserve  more  weapons  for  future  stages. 
Additionally,  the  proposed  method  gains  validation  when  considering  the  binomial 
distribution  with  <p  =  0.6.  Because  of  the  shape  of  the  binomial  distribution,  more 
decisions  are  selected  which  reinforce  the  firing  of  a  greater  number  of  weapons  at  each 
stage.  Firing  many  weapons  early  on  does  not  allow  for  the  dynamic  kill  probabilities 
to  take  full  effect,  and  solution  quality  degrades. 

The  other  benefit  of  the  proposed  method  is  that  computation  time  is  small  con¬ 
sidering  the  number  of  states  and  actions  over  which  are  iterated.  As  can  be  seen  in 
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Table  20,  the  ADP  method  consistently  outperforms  the  exact  myopic  solution.  This 
is  due  to  the  increase  in  the  size  of  the  decision  space  as  problem  size  increases. 

5.6  Conclusions 

This  chapter  presents  a  new  class  of  weapon-target  assignment  in  which  kill  proba¬ 
bilities  are  dependent  on  the  current  target  set  and  change  over  time.  An  approximate 
dynamic  programming  solution  method  is  introduced  which  incorporates  a  reduced 
decision  space  using  the  properties  of  order  statistics.  This  reduced  decision  space 
is  used  to  quickly  provide  high-quality  solutions.  Several  distributions  are  described 
to  determine  how  elements  from  the  decision  space  are  selected.  Results  for  the  ex¬ 
amples  tested  show  that  solutions  for  small  scale  problems  are  within  1%  of  optimal 
using  a  small  subset  of  the  full  decision  space.  The  large  scale  problems  tested  also 
show  vast  improvement  over  myopic  decision  policies. 

Future  research  will  include  investigation  of  different  approximation  dynamic  pro¬ 
gramming  techniques.  The  structure  of  this  problem  is  such  that  the  size  of  the  deci¬ 
sion  space  is  prohibitively  large,  so  methods  which  address  this  curse  of  dimensionality 
are  desired.  Though  it  was  slower  computationally,  implementing  a  multi-step  look 
ahead  solution  within  the  myopic  framework  may  result  in  better  solution  quality  be¬ 
cause  it  is  an  exact  method  in  the  sense  that  it  iterates  over  the  full  state  and  decision 
spaces.  Additionally,  a  reduced  decision  space  could  be  coupled  with  state  reduction 
techniques  such  as  aggregation  to  further  reduce  computation  time  while  maintaining 
solution  quality.  Investigating  roll-out  algorithms  which  take  into  account  the  future 
impact  of  current  decisions  may  be  implementable  within  a  simulation  framework  to 
quickly  determine  optimal  policies  for  problems  of  a  larger  size. 
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VI.  An  Integrated  Simulation  Framework  for  Optimal 
Weapons-Mix  Determination 


6.1  Abstract 

Genetic  algorithms  (GAs)  are  often  used  for  solving  stochastic  optimization  prob¬ 
lems  because  of  their  exploratory  and  exploitative  properties.  GAs  can  be  powerful 
tools  which  effectively  search  a  problem’s  solution  space,  but  in  many  cases  they  have 
their  limitations.  If  the  solution  space  of  the  problem  to  be  investigated  is  too  large, 
GAs  may  suffer  from  sub-optimality  or  slow  convergence.  Further,  if  the  problem  to 
be  optimized  is  of  a  black-box  nature,  global  optimality  is  difficult  to  prove.  This 
research  investigates  an  embedded  optimization  framework  in  which  a  GA  is  used  to 
optimize  the  mix  of  concept  weapons.  A  knapsack  formulation  is  used  to  determine 
the  best  mix  of  weapons,  with  a  weapon-target  assignment  problem  used  to  determine 
optimal  weapons  capabilities.  The  utility  of  each  weapon  type  is  initially  unknown 
and  determined  through  simulated  employment.  However,  because  the  capabilities, 
namely  the  probability  of  destroying  a  target  given  the  current  target  set,  of  each 
weapon  type  are  unique,  their  allocation  is  dependent  on  the  current  mix  of  weapons 
being  tested.  Further,  the  sequencing  and  allocation  policies  also  depend  on  the  ca¬ 
pabilities  of  the  current  weapons’  mix.  A  portion  of  the  gene  structure  within  the 
GA  is  dedicated  to  the  sequence  or  allocation  policy  in  which  the  weapons  are  used. 
This  research  proposes  two  solutions  to  this  problem  for  a  GA.  First,  a  gene  structure 
which  includes  the  sequencing  of  weapons  is  proposed,  and  at  each  stage,  a  static 
weapon-target  assignment  problem  is  solved  optimally  to  determine  the  weapons’  al¬ 
location.  As  an  alternative,  a  method  is  proposed  which  uses  approximate  dynamic 
programming  (ADP)  to  determine  near  optimal  allocation  strategies  in  order  to  re¬ 
duce  the  design  space  searched  by  the  GA.  In  each  case,  the  fitness  function  for  each 
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design  point,  or  weapon  set,  is  determined  through  simulation.  Results  demonstrate 
that  the  ADP  method  converges  in  fewer  generations  than  the  baseline  GA,  while  the 
baseline  GA  converges  with  less  computation  time. 

6.2  Introduction 

Combat  simulations  provide  a  means  for  military  analysts  to  investigate  a  wide 
range  of  problems  using  fewer  resources  than  testing  actual  systems.  Many  scenarios 
are  able  to  be  simulated  in  which  real-world  data  would  not  be  feasibly  attainable.  Air 
Force  Research  Laboratory  (AFRL)  analysts  are  developing  an  integrated  framework 
which  will  help  investigate  the  proposed  effects  of  future  weapons  systems  in  a  variety 
of  scenarios.  Part  of  this  effort  is  to  determine  synergistic  effects  of  weapons  against 
an  integrated  air  defense  system  (IADS)  and  consequently  optimize  a  mix  of  weapons 
classes  to  load  onto  an  aircraft.  The  previous  optimization  strategy  uses  a  genetic  al¬ 
gorithm  (GA)  which  generates  and  updates  populations  of  candidate  solutions.  These 
candidate  solutions  are  tested  by  stepping  forward  and  backward  through  time,  ran¬ 
domly  selecting  allocation  policies,  simulating  engagement  outcomes,  and  continuing 
on  until  a  terminal  state  has  been  realized.  LIpon  success  of  a  simulated  mission,  a 
candidate  allocation  strategy  is  stored  for  further  testing.  This  process  is  repeated  for 
each  weapons  mix  within  the  GA  until  a  locally  optimal  strategy  has  been  determined 
or  some  other  termination  criteria  has  been  met. 

This  chapter  introduces  methods  for  the  optimal  aircraft  weaponeering  using  an 
embedded  optimization  framework  in  order  to  maximize  the  damage  against  a  known 
set  of  targets.  Embedded  optimization  problems  use  the  optimal  solution  of  one 
problem  in  order  to  optimize  a  primary  objective  function. 

Because  of  their  complexity,  examples  of  embedded  optimization  problems  are 
sparsely  found  in  the  literature.  Some  examples  are  the  location  of  groundwater 
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systems  [9],  incorporating  chaotic  maps  for  PSO  parameter  adaptation  [6],  and  the 
optimization  of  hydrogen  networks  [106]. 

The  primary  objective  function  for  this  embedded  optimization  problem  represents 
a  constrained  knapsack  problem  where  the  utility  of  each  item  depends  on  the  total 
set  of  items  within  the  knapsack.  A  genetic  algorithm  is  developed  to  search  the 
candidate  solutions  which  are  then  tested  within  a  simulation  to  determine  their 
combined  utility.  Next,  a  multi-stage  dynamic  weapon-target  assignment  (DWTA) 
problem  is  solved  using  approximate  dynamic  programming  (ADP)  which  generates 
near  optimal  sequential  allocation  strategies  for  the  current  set  of  weapons. 

The  remainder  of  the  chapter  is  structured  as  follows.  Section  6.3  introduces 
the  knapsack  problem  and  gives  the  formal  definition  for  each  element,  including 
the  DWTA  subproblem.  The  GA  methodology  is  discussed  in  Section  6.4  where 
the  solution  of  the  DWTA  through  ADP  is  developed.  Next,  numerical  results  are 
presented  in  Section  6.5,  and  some  conclusions  and  areas  for  future  research  are  in 
Section  6.6. 

6.3  Problem  Formulation 

The  problem  is  formulated  as  a  multi-dimensional  knapsack  problem  which  rep¬ 
resents  a  mix  of  weapons  loaded  on  a  set  of  aircraft.  The  objective  for  this  problem 
is  to  optimize  the  set  of  weapons  such  that,  when  employed  against  a  known  set  of 
targets,  damage  to  the  targets  is  maximized.  The  utilities  are  determined  by  solving 
a  dynamic  weapon  target  assignment  problem  using  the  existing  weapon  set.  First 
the  multi-dimensional  knapsack  problem  is  formally  presented. 
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6.3.1  Multi-dimensional  Knapsack  Problem. 


Let  Xij  denote  the  number  of  weapons  of  type  i,i  =  1,2, ....  ,M  to  load  onto 
aircraft  j,j  =  1,  2, . . . ,  N.  Define  w*  as  the  utility,  which  is  considered  an  effectiveness 
measure,  of  weapon  type  i,  Cj  as  the  capacity  of  aircraft  j,  and  wy  as  the  size  or  weight 
of  weapon  i.  Additionally,  let  X )  be  the  set  of  weapon  types  that  are  able  to  go  on 
aircraft  type  j.  The  objective  is  then 


N  M 

max  EE  UiXij 

3= 1  i= 1 

subject  to 


(6.1) 


M 

WiXij  =  Cj  for  j  —  1,2, ...  N  (6.2) 

i= 1 

G  N  if  i  G  Xj,  0  otherwise.  (6.3) 

One  novelty  of  this  problem  is  that  the  utilities  are  functions  of  the  weapons 
currently  in  the  solution.  Let  x  =  (xu,X\2 ,  •  •  • ,  X\n,  X2i,%22,  ■  ■  ■  %2 n,  ■  ■  ■  %mn)  be  the 
current  decision  vector,  and  u  =  {u\,  u2,  ■  ■  ■ ,  %)  be  the  current  vector  of  weapon 
utility.  Then  u  =  f(x),  where  /(•)  is  a  function  defined  by  the  solution  to  a  separate 
subproblem.  For  this  research,  the  utilities  are  based  upon  the  weapons’  effects  within 
a  dynamic  weapon  target  assignment  problem.  This  problem  can  be  solved  exactly, 
approximately,  or  even  estimated  through  simulation. 

Various  methods  have  been  used  to  solve  knapsack  problems,  from  dynamic  pro¬ 
gramming  [53]  [79]  [65],  to  numerous  heuristics  such  as  ant  colony  optimization  (ACO) 
[55]  [92],  tabu  search  [43]  [36],  and  GAs  [27]  [90].  Additional  references  can  be  found 


109 


in  [53].  Next  the  DWTA  problem  is  presented  as  it  forms  the  basis  for  defining 
weapons’  utilities. 

6.3.2  Dynamic  Weapon- Target  Assignment  Problem. 

The  weapon-target  assignment  (WTA)  problem  is  a  model  of  combat  operations 
which  maximizes  the  total  expected  damage  caused  to  the  enemy’s  targets  (or  min¬ 
imize  the  value  of  leaker  missiles)  using  a  limited  number  of  weapons.  Optimally 
assigning  interceptors  to  targets  is  a  subject  that  has  become  increasingly  important 
with  the  proliferation  of  ballistic  missiles.  The  WTA  problem  is  known  to  be  NP- 
complete  [60].  In  general,  two  cases  of  the  WTA  problem  are  considered,  static  and 
dynamic.  The  static  case  concerns  itself  with  n  known  targets  and  m  known  weapon 
types  within  a  single  stage.  Optimal  solution  algorithms  are  known  for  two  cases  of 
the  static  WTA  (SWTA)  problem.  These  cases  are  when  all  the  weapons  are  iden¬ 
tical  [30]  [52]  and  when  the  targets  can  receive  at  most  one  weapon  [24]  [75].  The 
dynamic  case  can  involve  additional  stochastic  elements,  multiple  stages  and  other 
unique  characteristics.  While  no  efficient  exact  solutions  of  the  generalized  SWTA 
problem  exist,  much  research  has  been  done  to  effectively  determine  near  optimal 
allocation  policies  [42],  Specifically,  various  heuristics  have  been  applied  to  include 
generalized  network  flow  [5],  genetic  algorithms  [59]  [99],  neural  networks  [96]  and 
Lagrange  relaxation  [72], 

6.3. 2.1  Dynamic  Weapon- Target  Assignment  Problem. 

The  problem  is  modeled  as  an  infinite  horizon,  discrete  time  Markov  decision 
process  (MDP)  using  the  collection  of  objects 

{T,S,A,p(-\S,a),C(S,a,W)}  (6.4) 
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where  T  is  the  set  of  decision  epochs,  S  is  the  state  space,  As  represents  the  set  of 
allowable  actions  given  the  system  is  in  state  S,  with  A  =  Uses^5’  p('\S,a)  is  the 
probability  transition  function  conditioned  on  being  in  state  S  and  making  decision 
a  G  .As,  and  C(S,  a,  W)  is  the  reward  obtained  from  being  in  state  S,  making  decision 
a,  and  realizing  the  outcome  W. 

Let  T  =  {1,  2, . . .}  be  the  set  of  time  stages  and  let  t  G  T  denote  a  specihc  stage. 
Let  St  =  (Rt,  Yt)  G  S  denote  the  state  of  the  system  at  time  t,  where  Rt  is  a  vector 
indicating  the  number  of  weapons  (of  M  different  types)  remaining  in  inventory  and 
Yt  is  a  vector  indicating  the  number  of  targets  (of  N  different  types)  still  functioning. 
Rt  =  (Rti,  Rt2,  ■  ■  ■  Rtu),  where  Rtr  is  the  number  of  weapons  of  type  r  at  time  t, 
r  —  1, ,  M.  Yt  =  (Ytl,Yt 2, . . .  YtN),  where  Yty  is  the  number  of  targets  of  type  y  at 
time  t,  each  with  associated  value,  Vy,  y  —  1, . . . ,  N.  A  state  S  G  S  corresponds  to 
a  particular  pair  of  vectors  indicating  the  number  of  weapons  and  targets  remaining. 
Define  pry\Yt  as  the  single-shot  probability  of  kill  if  weapon  type  r  is  allocated  to  target 
type  y  given  the  current  target  set  Yt.  Define  qry\Yt  =  1  —  Pry\Yt  as  the  corresponding 
probability  of  survival.  The  conditional  probabilities  of  survival  are  used  to  model 
the  cooperative  nature  of  an  IADS;  as  certain  targets  are  destroyed,  the  attacker 
achieves  improved  probability  of  destroying  other  targets.  For  brevity,  pry  =  pry\Yt 
and  qry  =  qry\Yt  is  henceforth  used. 

As  with  any  MDP,  at  each  time  step  the  state  determines  the  set  of  allowable 
controls.  The  decision  is  a  function  of  the  remaining  weapons  and  the  current  set  of 
targets  in  the  threat  environment.  For  any  epoch,  Ast  represents  the  set  of  allowable 
decisions  given  the  system  is  in  state  S  at  time  t.  Define  the  decision  variables  atryj 
as  the  number  of  weapons  of  type  r  to  assign  to  target  j,  of  type  y,  at  time  t.  A 
matrix  of  decisions  and  the  constraint  set  can  be  defined  as 
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and 


a(St) 


Otin 

0*112 

o*nytl 

0*121 

0*12Yt2 

0*lATYtJV 


0*211 

0*212 

0*21Yn 

0221 

0*22Yt2 

0*2ATYtJV 


0*M  11 
0*M  12 

0*MlYt i 
OM21 

0>tM2Yt2 

UtMNYtN 


(6.5) 


AT  Yfc 


*2/ 


Ast  =  <*($)!  EES  atryj  E  -^1  r  for  r  1,2,...,  -AT,  &tryj  ^  E 


(6.6) 


t=l  2/=l  1=1 


Here  the  0  index  represents  the  allowable  control  of  “do  nothing ” .  At  each  time  step, 
given  a  state  5),  action  a*,  and  outcome  Wt+1,  the  system  transitions  according  to 


Em  =  EM(Et,at,lHm)  (6.7) 

where  SM(-)  is  a  function  describing  the  system’s  dynamics.  For  the  DWTA  problem, 
states  transition  in  two  distinct  fashions.  First,  let 

N  Yty 

MrU  =  atryj )  (6-8) 

y=  i  l=i 

be  a  vector  denoting  the  number  of  weapons  of  type  r  fired  at  time  t.  Then  the 
weapon  state  transitions  deterministically  following 
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Rt+1  —  ( Rtr  ~  dtr)rLl 


(6.9) 


The  target  vector  transitions  probabilistically  based  upon  the  allocation  policy  at 
each  decision  epoch. 

Let  Yt+  ]  y3  be  a  random  variable  representing  the  outcome  of  the  jth  target  of  type 
y  given  a  decision  such  that 


Y, 


t+l,yj 


{0  if  target  j  survives  the  attack, 

1  if  target  j  is  destroyed  during  the  attack. 


for  each  target  type  y.  Further,  define 


(6.10) 


bt+1,11 

bt+1,12 


Yt+\,lYn 
Yt+ 1,21 
Yt+ 1,22 


Yt+i  = 


Yt+l,2Yt2 


Yt+1,N1 

Yt+1,N2 


Yt+l,NYtN 


then  the  target  state  element  transitions  following 


(6.11) 
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yt+i=  y„  -  E  W* 


(6.12) 


—  0|S),op}  — 


P\Xt+l,yj  —  1| St,  at}  — 


i  -  ifnw  =  i 


n  y;  =  o 


n"l(?ri)^  if  =  l 


if  K  =  0 


(6.13) 


(6.14) 


Here,  gn-  represents  the  single  shot  survival  probability  if  weapon  type  r  is  shot  at 
target  j.  This  must  be  done  for  all  active  targets  with  weapons  allocated  to  them  at 
time  t.  If  nt  denotes  the  number  of  active  targets  with  weapons  allocated  to  them  at 
stage  t,  then  if  3^i+i  is  the  set  of  possible  outcomes  known  by  time  t  +  1,  |X+i|  =  2nt . 

As  previously  discussed,  each  target  has  an  associated  value,  Vr  Then  the  value 
obtained  at  any  time  step  follows 


N  Yty 

C,+t(S„ a„  Yt+ 1)  -  V  V  v,YWi 

y=i  j= i 


(6.15) 


We  accumulate  the  value  of  any  target  destroyed  during  the  time  interval  (t,  t  + 1). 
The  objective  is  determine  a  policy  n  e  n  mapping  each  state  to  an  action  which 
maximizes 


maxE7r  <  'yCf  (St,  AJ (St)) 


(6.16) 


where  n  is  the  set  of  all  possible  policies  and  7  is  the  discount  factor. 
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The  DWTA  problem  provides  a  more  practical  implementation  by  including  a 
temporal  component.  As  such,  the  DWTA  is  a  much  more  complex  problem  from 
a  mathematical  standpoint  which  has  received  a  fair  amount  of  attention  in  the  lit¬ 
erature.  Similar  to  the  SWTA,  numerous  methods  have  been  employed  to  provide 
solutions  for  various  types  of  DWTA  problems.  As  the  originator  of  the  dynamic 
instance,  [47]  provides  several  results  which  are  generalizable  to  the  DWTA  problem. 
[70]  and  [71]  uses  stochastic  decomposition  for  the  two-stage  problem  previously  de¬ 
fined.  An  extension  of  the  generalized  two-stage  problem  called  the  shoot-look-shoot 
target  assignment  problem  also  has  a  fair  amount  of  associated  literature,  but  as  it 
us  fundamentally  different,  it  is  not  discussed  herein.  Specific  to  the  general  DWTA 
problem,  [24]  uses  a  static  WTA  approximation  scheme  within  an  iterative  linear 
network  flow  framework  to  effectively  provide  high-quality  solutions  for  the  DWTA. 
Because  of  the  integer  restriction  for  the  decision  variables,  the  chromosome  represen¬ 
tation  within  a  GA  presents  a  useful  scheme  for  solving  both  the  static  and  dynamic 
versions  of  the  WTA  problem.  As  such,  much  work  has  developed  hybrid  GAs  to 
assist  in  solving  the  DWTA.  [99]  apply  a  modified  GA  to  the  DWTA  and  introduces 
weapon  use  deadlines  within  the  problem  formulation.  These  deadlines  follow  the 
principles  of  scheduling  theory,  and  are  in  the  form  of  additional  constraints  such 
that  a  weapon  has  to  be  shot  at  a  target  by  a  specified  time  or  it  is  rendered  unus¬ 
able.  The  authors  call  their  method  a  modified  GA  because  it  applies  a  basic  GA 
iteratively,  assigning  a  weapon  to  a  target  (possibly  suboptimally)  immediately  be¬ 
fore  the  deadline  is  reached.  [101]  develop  a  heuristic  which  uses  problem  information 
(domain  knowledge)  and  constraint  programming  to  assign  priorities  to  assignments. 
Evolutionary  heuristics,  which  use  a  hybridized  GA  with  memetic  algorithms,  have 
also  been  applied  to  the  DWTA  by  [25].  Additionally,  [54]  applies  a  hybrid  heuristic 
which  uses  a  simulated  annealing  (SA)  type  heuristic  to  determine  the  fitness  of  a 
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population  within  a  GA  framework.  Other  heuristic  techniques  applied  to  the  DWTA 
include  Tabu  Search  [102],  AGO  with  tabu  table  updates  [103],  and  a  modified  Hun¬ 
garian  method  with  PSO  [56]  (though  this  is  in  an  open  source  text,  so  it’s  rigor 
may  be  unverified).  Lastly,  exact  dynamic  programming  [91] [89]  has  also  been  ap¬ 
plied  to  the  DWTA.  The  last  portion  of  the  WTA  literature  review  focuses  on  the 
specific  shoot-look-shoot  scenario,  as  well  as  some  miscellaneous  methods  which  are 
not  explicitly  weapon-target  assignment  problems. 

6.4  Methodology 

In  this  section  the  solution  approach  is  discussed,  to  include  the  specific  details  of 
the  GA,  and  the  near-optimal  allocation  generation  using  ADP.  Finally,  the  integrated 
framework  is  introduced,  and  the  various  algorithms  are  presented. 

6.4.1  Genetic  Algorithms. 

Because  of  its  complexity  and  the  stochastic  nature  of  the  decision  variable  util¬ 
ities,  achieving  an  optimal  mix  of  weapon  types  under  constraints  may  not  be  effi¬ 
ciently  obtained  through  traditional  optimization  methods.  Because  of  the  ability  to 
specifically  design  its  heuristic  characteristics,  GAs  provide  a  flexible  means  for  inves¬ 
tigating  combinatorial  optimization  problems,  especially  those  with  integer  solutions. 
Genetic  algorithms  are  search  procedures  intended  to  mimic  the  natural  evolution  of 
biologic  systems  in  which  characteristics  which  provide  improvement  to  the  fitness 
are  selected  in  lien  of  those  in  which  quality  is  not  demonstrated.  Genetic  algorithms 
have  been  shown  effective  in  a  wide  range  of  resource  allocation  problems  including 
project  scheduling  [45]  [44],  knapsack  problems  [27]  [90],  and  target  assignment  [99] 
[25].  The  general  steps  of  a  GA  are  presented  in  Algorithm  7. 
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Algorithm  7  General  steps  of  a  GA 

Initialize 

•  Generate  population  P, 

•  Set  parent  selection,  mutation,  and  crossover  parameters 

while  number  of  generations  has  not  been  reached  do 

•  Determine  Fitness  of  each  population  member 

•  Select  the  parent  population  for  mating 

•  Generate  offspring  using  crossover  rules  and  parent  population 

•  Ensure  feasibility  of  offspring  and  correct  any  infeasibility 

•  Determine  any  mutated  member(s) 

end  while 

The  GA  is  developed  by  structuring  the  gene,  computing  the  fitness  function, 
determining  how  to  select  the  parent  population,  and  dictating  how  offspring  are 
generated  through  crossover  and  mutation. 

6.4. 1.1  Gene  Structure. 

For  a  knapsack  problem  with  known  utilities  or  value,  the  gene  consists  of  a  string 
of  N  integer  elements  defining  a  feasible  mix  of  weapons  [27].  Because  the  weapon 
utility  is  uncertain,  the  gene  is  structured  to  accommodate  allocation  information 
used  to  solve  the  DWTA  during  simulation.  Specific  elements  of  the  gene  are  also 
designated  for  each  aircraft  type  to  ensure  the  feasibility  with  constraint  (6.2).  For 
the  first  method,  the  gene  includes  a  string  of  integer  characters  representing  the 
time  step  in  which  the  weapon  is  to  be  fired.  Define  T  as  the  maximum  number  of 
engagement  time  periods  and  Sk  as  the  stage  in  which  the  kth  weapon  will  be  used, 
k  —  1,2, ,  Mj  for  j  =  1,  2, . . . ,  N,  and  Sk  G  {1,  2, . . . ,  T}.  An  example  is  shown  in 
Figure  10  with  two  aircraft  being  used.  In  this  example,  aircraft  one  has  a  capacity  of 
eight,  aircraft  two  has  a  capacity  of  two,  and  Wi  —  1  for  i  =  1, 2, . . . ,  N  —  7.  For  this 
example,  an  additional  constraint  is  induced  that  the  four  weapon  types  which  can 
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Current  mix  of  weapons  Stage  in  which  weapons  are  fired 


Figure  10.  Gene  structure  for  method  one 

fit  on  aircraft  one  cannot  fit  on  aircraft  two,  and  though  the  three  types  of  weapons 
are  able  to  be  placed  on  aircraft  two,  their  capabilities  are  such  that  they  will  not  be 
selected  for  inclusion  on  aircraft  one  in  an  optimal  solution.  For  the  example  shown, 
the  current  gene  has  zero  weapons  of  type  one,  four  weapons  of  type  two,  two  weapons 
of  type  three  and  four,  and  one  each  of  weapon  type  five  and  seven.  Additionally,  this 
genetic  structure  provides  the  stage  in  which  the  weapons  are  to  be  fired.  Set  T  —  4, 
then  one  each  of  weapon  type  two,  three,  and  four  are  fired  in  stage  one,  followed 
by  one  each  of  weapon  type  three  through  seven,  and  the  remaining  weapons  of  type 
two  are  fired  in  stages  three  and  four. 

The  second  method  uses  the  weapon  mix  portion  of  the  gene  structure,  but  in 
place  of  the  stage  selection,  each  gene  is  used  to  solve  a  DWTA  problem  through 
ADP.  The  ADP  solution  methods  are  from  [76]  and  generate  near  optimal  allocations 
for  any  mix  of  weapons  based  on  the  targets  represented  in  the  simulation.  Figure  11 
presents  this  solution  framework. 

6. 4. 1.2  Initial  Population. 

Similar  to  the  work  of  Chu  [27],  the  initial  population  size  is  set  to  P  =  50, 
and  genes  are  generated  randomly.  Feasibility  of  each  gene  in  the  initial  population 
is  ensured  by  randomly  adding  weapons  to  slots  on  the  aircraft  until  constraint  6.2 
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Figure  11.  Simulation  framework  using  ADP  solution  of  DWTA 

has  been  satisfied.  This  operation  is  performed  independently  for  each  aircraft  j  = 
1,  2, ...  ,N.  Once  a  feasible  gene  structure  has  been  generated  for  each  aircraft  they 
are  combined  to  make  a  full  gene.  The  genes  of  the  initial  population  are  used  within 
the  simulation  framework  to  determine  their  relative  fitness  before  parent  selection. 


6.4. 1.3  Fitness  determination. 

For  each  method,  a  Monte  Carlo  simulation  is  used  to  determine  the  fitness  of 
the  current  mix  of  weapons.  For  the  first  method,  a  static  weapon  target  assignment 
problem  (SWTA)  is  solved  using  the  weapons  fired  during  a  specific  stage.  Because 
the  number  of  weapons  to  be  fired  at  any  give  stage  is  generally  small,  a  recursive 
method  is  used  to  optimally  generate  single  stage  assignments.  The  second  method, 
however,  uses  ADP  to  solve  the  DWTA  formulated  in  Section  6. 3. 2.1  for  the  current 
gene.  Allocation  policies  for  all  possible  states  are  approximated  using  the  methods 
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discussed  in  Section  6.4.2.  These  policies  are  used  as  inputs  within  the  Monte  Carlo 
simulation  via  a  lookup  table.  As  the  simulation  steps  through  time,  allocations  are 
applied  based  on  the  DWTA  outputs  and  the  outcomes  simulated  and  the  state  is 
updated.  In  each  case,  1,000  simulations  are  run  to  determine  the  expected  weapons 
effectiveness  against  the  targets.  This  average  value  is  then  used  as  the  fitness  within 
the  GA. 

6.4. 1.4  Selection  of  Parent  Population. 

Parent  selection  is  the  determination  and  assignment  of  individuals  in  the  pop¬ 
ulation  which  have  comparatively  favorable  genes  which  should  be  passed  on  to  the 
offspring.  Two  parents  are  selected  and  crossover  operators  are  used  to  generate 
offspring.  An  elitist  model  is  employed  within  the  GA  where  the  top  (%  of  genes 
are  selected  as  primary  mates.  The  remainder  of  the  population  is  divided  equally 
amongst  the  primary  mates  to  make  the  next  generation.  This  portion  of  the  pop¬ 
ulation  is  called  secondary  mates.  The  top  P(1  —  Q/P  *  (  are  assigned  to  the  top 
primary  mate,  the  next  group  of  secondary  mates  are  assigned  to  the  second  primary 
mate,  and  so  on.  This  parental  scheme  is  employed  in  both  GA  methods  investigated. 

6. 4. 1.5  Crossover  and  Mutation. 

The  integer  representation  of  the  genetic  structure  allows  for  an  easy  crossover  op¬ 
erator  implementation.  Because  GAs  are  generally  insensitive  to  crossover  operator 
choice  [27],  the  crossover  selection  is  generated  randomly  based  on  uniform  selection. 
For  method  one,  a  second  crossover  point  is  included  which  is  restricted  to  the  se¬ 
quencing  portion  of  the  genetic  structure.  This  allows  for  better  exploration  of  the 
design  space.  This  second  crossover  is  also  uniformly  selected.  An  example  of  the 
crossover  operator  is  shown  in  Figure  12 
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Figure  12.  Crossover  operator  for  method  one 


The  crossover  operator  for  method  two  is  restricted  to  a  single  crossover  point  in 
the  weapons  mix  portion  of  the  gene  structure.  In  this  scheme,  each  set  of  parents 
generates  two  children,  so  the  size  of  the  next  generation  remains  constant.  Based 
on  the  recommendations  of  Chu  and  Beasley  [27],  these  operators  were  arbitrarily 
selected,  but  were  kept  because  computational  results  were  positive.  Additionally, 
a  mutation  parameter  i ]  is  implemented  to  determine  if  any  offspring  elements  are 
changed  randomly.  The  mutation  probability  can  help  increase  or  decrease  explo¬ 
ration,  but  is  traditionally  set  to  a  small  value.  A  random  check  occurs  for  each 
gene,  and  when  applicable,  a  new  value  is  randomly  selected  for  single  element  of  the 
weapons  mix  gene  structure. 


6.4. 1.6  Offspring  Feasibility  Correction. 

In  certain  cases,  the  offspring  created  by  crossover  and  mutation  operations  are 
infeasible,  because  of  the  equality  constraints  of  (6.2).  To  guarantee  feasibility  a  repair 
operator  is  applied  based  on  the  offspring  gene  and  random  selection.  If  the  equality 
constraints  are  not  satisfied  randomly  selected  elements  from  the  gene  structure  are 


121 


either  increased  or  decreased  until  feasibility  is  regained.  This  represents  adding  or 
removing  weapons  from  aircraft  so  that  the  aircraft  always  carries  the  maximum 
allowable.  This  repair  operator  was  selected  to  increase  the  exploration  capacity  of 
the  GA.  The  repair  algorithm  is  as  follows: 


Algorithm  8  GA  offspring  gene  repair  operator 

if  E£i  WiXij  <  Cj  for  any  j  then 
For  each  j  where  (6.2)  is  violated 
while  wixij  <  cj  do 

•  randomly  select  an  element  i  from  the  gene  structure  for  aircraft  j 

•  Set  Xij  =  Xij  +  1 

end  while 
else 

if  Eti  =1  WiXij  >  Cj  for  any  j  then 
For  each  j  where  (6.2)  is  violated 
while  WiXij  >  Cj  do 

•  randomly  select  an  element  i  from  the  gene  structure  for  aircraft  j 

•  Set  —  1 

end  while 
end  if 
end  if 


This  operator  is  easily  implemented  and  provides  further  exploration  of  the  design 
space  because  of  its  random  nature. 

6.4.2  Solution  of  the  DWTA. 

As  stated,  Method  2  integrates  an  approximate  dynamic  programming  routine 
to  reduce  the  space  investigated  by  the  GA.  Instead  of  using  the  design  structure 
presented  in  Section  6.4.1  it  is  updated  to  the  following. 


D(P)  —  (Ri,  R-2 , . . . ,  Rm) 


(6.17) 


122 


This  design  point  is  then  fed  to  the  ADP  routine  which  determines  atij  for  all  t  G  T, 
%  —  1, . . . ,  m,  and  j  =  1, ...  ,n.  This  is  represented  in  Figure  11 


6.5  Numerical  Results  and  Discussion 


Two  sets  of  experiments  were  performed  to  determine  the  efficacy  of  each  method. 
The  first  set  of  experiments  explores  the  case  where  c±  =  6  and  C2  =  2,  and  there  are 
five  targets  following  Yt  =  (2, 1,1, 1)^  and  V  =  (100,150,200,300).  As  in  [76],  the 
probabilities  which  define  state  transitions  for  both  sets  of  experiments  are  defined  in 
Table  21. 

Table  21.  Updated  conditional  transition  probabilities 


Single  Shot  pry 
(all  target  types  remain) 

No  SAMs 
Remaining 

No  Radars 
Remaining 

No  SAM 
or  Radar 

Weapon  Type 

SAM  1 

SAM  2 

Radar 

C2 

Radar 

C2 

SAM  1 

SAM  2 

C2 

C2 

1 

0.47 

0.51 

0.6 

0.59 

0.67 

0.69 

0.75 

0.65 

0.76 

0.76 

2 

0.53 

0.68 

0.58 

0.54 

0.65 

0.83 

0.87 

0.7 

0.84 

0.92 

3 

0.48 

0.56 

0.47 

0.51 

0.58 

0.89 

0.86 

0.94 

0.86 

0.91 

4 

0.55 

0.58 

0.48 

0.56 

0.7 

0.93 

0.68 

0.64 

0.88 

0.94 

5 

0.47 

0.65 

0.45 

0.62 

0.83 

0.78 

0.54 

0.71 

0.82 

0.83 

6 

0.56 

0.48 

0.51 

0.47 

0.74 

0.77 

0.55 

0.78 

0.84 

0.9 

7 

0.64 

0.51 

0.56 

0.48 

0.65 

0.95 

0.7 

0.68 

0.94 

0.95 

After  several  iterations,  a  population  size  of  50  was  selected  as  a  reasonable  size 
to  begin  exploration  of  the  design  space.  For  each  method,  the  same  initial  popula¬ 
tion  was  used,  and,  as  appropriate,  common  random  numbers  were  used  to  reduce 
experimental  noise.  In  addition,  because  of  the  convergence  properties  demonstrated, 
50  generations  were  used.  A  representative  example  of  the  experimental  results  are 
presented  in  Figure  13. 

In  this  instance,  the  baseline  GA  with  randomly  generated  sequencing  outper¬ 
formed  the  integrated  ADP  GA  method,  though  solution  quality  may  be  of  prac¬ 
tical  insignificance.  For  the  baseline  GA,  the  solution  is  x  =  ( x.\  =  (2, 1,3,0), 
x.2  =  (0,  0,  2))  and  the  weapons  would  be  bred  (myopically)  over  two  stages.  For  the 
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Small  Scale  Genetic  Algorithm  Results 


Figure  13.  Plot  of  small  scale  genetic  algorithm  results 

integrated  ADP  method  the  solution  is  x  =  (x.\  =  (0,4,  0,0)  and  x.2  =  (2,0,0)).  In 
both  cases,  the  mix  of  weapons  was  converged  to  rather  quickly  (approximately  10 
generations),  while  the  sequencing  or  allocations  strategy  continued  evolving.  As  a 
note,  from  an  acquisition  perspective,  because  the  development,  maintenance,  and 
sustainment  costs  associated  with  numerous  high-value  weapon  types,  the  integrated 
ADP  method  may  have  resulted  in  a  more  desirable  solution. 

The  second  set  of  experiments  were  on  a  slightly  larger  problem  where  C\  =  8,  = 

2,  Yt  =  (2,2,2, 1)T,  and  V  =  (100,150,200,300).  The  results  for  the  larger  problem 
are  shown  in  Figure  14. 

For  the  second  experiment,  Figure  14  shows  the  marked  improvement  in  solution 
quality  using  the  integrated  ADP  method.  Consistent  with  the  results  found  in 
[76],  there  is  an  approximately  15.5%  improvement  over  the  random  sequencing  with 
myopic  allocation.  One  explanation  in  the  difference  in  results  is  because,  as  problem 
size  increases,  there  is  a  greater  benefit  of  allocating  weapons  while  considering  the 
impact  those  allocations  may  have  on  the  future  of  the  system.  The  solutions  for  the 
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Larger  problem  size  Genetic  Algorithm  Results 


Figure  14.  Plot  of  small  scale  genetic  algorithm  results 

baseline  and  ADP  methods  are  x  =  (x.i  =  (2,  5, 1,  0),  x.2  =  (0, 1, 1))  and  x  =  (x.i  = 
(2,  5,  0, 1),  x.2  =  (1,  0, 1)),  respectively.  The  fact  that  both  methods  converged  to  very 
similar  solutions  provides  some  validation  of  the  proposed  solution  framework.  It  also 
further  emphasizes  the  impact  that  the  system  dynamics  will  have  on  the  solution. 
Additionally,  looking  at  the  set  of  solutions  presented,  weapon  type  two  appears  to 
be  a  dominant  weapon  that  would  be  of  interest  to  those  making  critical  acquisition 
decisions. 

6.6  Conclusions 

This  research  presents  an  embedded  optimization  problem  in  which  the  solution 
of  a  WTA  problem  is  used  to  determine  the  utility  needed  to  solve  a  multidimensional 
knapsack  problem.  Two  methods  were  presented  that  are  shown  to  converge  to  qual¬ 
ity  solutions  using  different  allocation  determinations.  For  larger  problem  sizes,  the 
integrated  ADP  method  outperforms  the  baseline  method  with  random  sequencing 
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and  myopic  allocation.  The  quality  in  solution,  however,  comes  at  a  price.  Because 
each  gene  represents  a  unique  optimization  problem,  the  ADP  method  must  load 
problem  data  into  memory  prior  to  executing  a  solution.  As  problem  size  increases, 
this  may  be  computationally  impractical  when  compared  to  the  random  sequencing 
method.  Since  both  methods  converged  to  a  similar  solution  for  the  multidimensional 
knapsack  problem,  it  may  be  more  effective  to  do  a  quick  GA  search  of  the  space  using 
the  baseline  method  and  follow  it  up  using  ADP  to  determine  a  better  employment 
strategy.  This  may,  however,  be  mitigated  through  the  use  of  better  computing  lan¬ 
guages,  higher  powered  computers,  distributed  computing.  Additionally,  numerous 
other  areas  will  be  explored  in  this  ongoing  research  area.  First,  this  formulation 
assumes  known  weapons  effects,  when  that  may  not  necessarily  be  the  case.  Future 
research  will  consider  Bayesian  updates  of  the  kill  probabilities  as  a  feedback  from 
the  simulation  outputs.  Additionally,  because  they  have  been  shown  to  be  effective, 
hybrid  heuristics  may  be  explored  to  improve  solution  quality  in  fewer  generations. 
As  alluded  to,  analysts  may  be  interested  in  only  exploring  a  few  weapon  types  fur¬ 
ther,  so  constraints  may  be  added  to  the  knapsack  problem  that  reduce  the  number 
of  weapon  types  allowed  in  any  single  gene  structure.  Similarly,  if  certain  weapon 
types  are  able  to  be  used  on  either  aircraft,  but  some  weapon  types  are  only  allowed 
on  a  specific  aircraft,  complexity  increases.  Another  area  would  use  heuristics  for 
the  static  WTA  problem  solved  that  will  consider  the  impact  of  allocations  on  future 
events.  This  may  help  increase  solution  quality  for  the  much  faster  baseline  method. 
Lastly,  the  ultimate  purpose  of  this  is  to  integrate  it  within  a  high-level  combat  simu¬ 
lation  in  lieu  of  the  simple  Monte  Carlo  simulation  presented  above.  This  will  provide 
analysts  with  a  means  for  effectively  determining  which  weapons  concepts  to  explore 
further,  how  to  appropriately  fit  a  set  of  aircraft  with  these  weapon  types,  and  how 
to  effectively  employ  them  within  a  given  scenario. 
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VII.  Conclusions  and  Recomendations 


Several  conclusions  can  be  highlighted  based  on  DP  extensions  and  computational 
results.  This  chapter  reviews  the  research,  provides  concluding  insights  about  the 
results,  and  identifies  topics  for  future  research  efforts. 

7.1  Summary  of  Effort 

Throughout  this  effort,  several  significant  and  original  contributions  are  made 
to  the  field  of  operations  research  by  developing  new  models  for  investigation  and 
identifying  novel  solution  techniques  and  performing  computational  studies.  First, 
an  efficient  solution  methodology  is  presented  that  determines  optimal  weapons  al¬ 
location  for  a  two-stage  DWTA  problem  instance.  This  is  the  first  provably  optimal 
algorithm  for  this  problem  instance.  Next,  the  two-stage  problem  is  extended  and 
considers  the  dependency  across  stages  when  determining  allocation  policies  which 
demonstrates  improvement  over  existing  methods  and  effective  scalability  for  large 
problems.  In  addition,  this  dissertation  formulates  and  solves  a  previously  undefined 
instance  of  the  DWTA  problem  that  incorporates  dynamic  probabilities  of  kill  using 
problem  structure  to  develop  effective  solution  strategies.  To  address  this  problem, 
a  rigorous  and  novel  approximate  dynamic  programming  method  is  developed  which 
reduces  the  size  of  the  decision  space  to  a  more  computationally  tractable  size.  Sev¬ 
eral  distributions  were  investigated  which  use  the  problem  structure  to  reinforce  the 
selection  quality  decisions.  Finally,  an  embedded  optimization  problem  which  seeks 
to  optimize  an  aircraft  weaponeering  policy  is  developed.  This  optimization  problem 
defines  the  utility  of  a  weapon  through  the  solution  to  a  weapon-target  assignment 
problem.  These  utilities  are  then  used  to  solve  a  constrained  multi-dimensional  knap¬ 
sack  problem  that  represents  placing  weapons  on  a  set  of  aircraft.  A  GA  is  used  as 
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the  solution  framework,  and  two  techniques  that  integrate  the  sequential  allocation  of 
weapons  into  the  gene  structure  are  compared  and  contrasted.  Through  the  develop¬ 
ment  of  the  GA,  this  dissertation  effectively  determines  locally  optimal  weaponeering 
policies.  In  addition,  this  research  develops  a  defensible  methodology  for  real-time 
allocation  strategies  within  simulation  applications  for  current  practitioners. 

7.2  Conclusion 

Results  for  this  research  demonstrate  the  contribution  of  the  effort.  In  each  case 
tested,  high  quality  solutions  are  generated  in  much  less  computation  time,  com¬ 
paratively.  For  the  two-stage  problem,  the  algorithms  ability  to  determine  optimal 
solutions  is  proven  through  several  theorems.  Further,  the  computational  complexity 
of  the  method  is  shown  to  provide  solutions  in  a  much  more  efficient  manner.  Next, 
computational  results  for  the  two-stage  extension  demonstrate  the  effectiveness  of  the 
adaptive  dynamic  programming  methodology  in  obtaining  near  optimal  solutions  for 
various  problem  instances  in  much  less  computation  times  than  what  currently  exists 
in  the  literature.  Additionally,  results  show  a  substantial  improvement  in  solution 
quality  in  less  computation  time  than  other  techniques  have  demonstrated  as  prob¬ 
lem  size  increases.  Because  of  the  combinatorial  nature  of  the  weapon  target  assign¬ 
ment,  determining  an  exact  solution  using  dynamic  programming  is  computationally 
intractable.  Further,  current  literature  does  not  provide  methods  to  appropriately  ad¬ 
dress  the  vast  size  of  the  decision  space  for  any  given  state.  The  solution  methodology 
presented  in  this  research  greatly  reduces  the  size  of  the  decision  space  necessary  for 
investigation,  and  exploits  the  special  structure  of  the  problem  to  maintain  solution 
quality  in  an  efficient  manner.  Finally,  by  integrating  the  sequential  allocation  policies 
into  a  GA,  two  options  are  available  which  trade  off  computation  time  for  solution 
quality  when  determining  optimal  weapons  mix.  Results  show  that  random  genera- 
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tion  of  the  sequence  with  a  myopic  allocation  strategy  is  fast,  but  does  not  give  the 
solution  quality  provided  by  determining  near  optimal  sequential  allocation  policies 
using  ADP.  Overall,  this  research  presents  a  defensible  approach  that  addresses  gaps 
in  the  literature  and  novel  approaches  for  the  solution  of  the  motivating  problems.  In 
each  case,  numerous  tests  are  run  and  the  results  presented.  As  with  many  research 
efforts,  as  many  questions  get  answered,  new  questions  arise. 

7.3  Future  work 

With  each  of  the  presented  areas  of  research  comes  a  stream  of  potential  future 
research.  Extensions  may  be  investigated  for  each  of  the  problem  types,  along  with 
the  increase  in  complexity  through  the  alteration  of  assumptions.  For  each  effort,  an 
associated  discussion  of  future  research  is  presented. 

7.3.1  Shoot-look-shoot. 

The  two-stage  DWTA  problem  has  many  identifiable  extensions.  First,  the  model 
can  be  extended  to  include  the  impact  of  cost  on  the  approximation  scheme  as  well  as 
the  effect  sensors  may  have  in  the  first  stage,  second  stage,  or  across  both  stages.  Addi¬ 
tionally,  as  weapons  for  this  effort  are  currently  homogeneous  within  a  stage,  a  natural 
extension  will  investigate  non- homogeneous  weapons  in  and  across  stages.  Further, 
because  the  subgradients  represent  the  marginal  increase  in  reserving  a  weapon  for 
future  stages,  the  algorithm  may  be  very  effective  in  instances  where  there  are  more 
than  two  stages.  Therefore,  additional  research  may  extend  this  to  multiple  stages. 
Finally,  the  presented  method  starts  with  all  weapons  initially  allocated  in  stage  one. 
This  research  may  be  extended  to  explore  the  initial  allocation  of  weapons  in  the 
second  stage,  or  some  other  initial  allocation  policy. 
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7.3.2  Cooperative  DWTA  Problem. 


Because  this  is  a  novel  formulation,  there  is  an  extensive  list  of  future  work.  First, 
different  approximate  dynamic  programming  techniques  should  be  investigated  to 
address  the  dimensionality  of  the  decision  space.  Though  it  was  slower  computa¬ 
tionally,  implementing  a  multi-step  look  ahead  solution  within  the  myopic  framework 
may  result  in  better  solution  quality  because  it  is  an  exact  method  in  the  sense  that 
it  iterates  over  the  full  state  and  decision  spaces.  Additionally,  a  reduced  decision 
space  could  be  coupled  with  state  reduction  techniques  such  as  aggregation  to  further 
reduce  computation  time  while  maintaining  solution  quality.  Finally,  investigating 
roll-out  algorithms  which  take  into  account  the  future  impact  of  current  decisions 
may  be  implementable  within  a  simulation  framework  to  quickly  determine  optimal 
policies  for  problems  of  a  larger  size. 

7.3.3  Embedded  Optimization  Framework. 

Finally,  the  embedded  optimization  framework  is  in  its  infancy  and  much  is  left 
to  be  accomplished.  First,  the  present  formulation  assumes  known  weapons  effects, 
though  because  future  weapons  concepts  are  being  investigated,  weapons  effects  are 
likely  unknown.  Future  research  will  consider  Bayesian  updates  of  the  kill  probabili¬ 
ties  as  a  feedback  from  the  simulation  outputs.  Additionally,  because  they  have  been 
shown  to  be  effective,  hybrid  heuristics  may  be  explored  to  further  improve  solution 
quality.  As  alluded  to,  analysts  may  be  interested  in  only  exploring  a  few  weapon 
types  further,  so  constraints  may  be  added  to  the  knapsack  problem  that  reduce  the 
number  of  weapon  types  allowed  in  any  single  gene  structure.  Similarly,  aircraft- 
specific  weapons  may  be  investigated  as  an  additional  constraint  in  the  model.  This 
would  likely  increase  the  complexity  of  the  model  and  may  impact  the  effectiveness  of 
the  developed  solution  methodology.  Another  area  would  use  heuristics  for  the  static 
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WTA  problem  solved  that  consider  the  impact  of  allocations  on  future  events.  This 
may  help  increase  solution  quality  for  the  much  faster  baseline  method.  The  ultimate 
purpose  of  this  research  effort  is  to  integrate  it  within  a  high-level  combat  simulation 
in  lieu  of  the  simple  Monte  Carlo  simulation.  This  will  provide  analysts  with  a  means 
for  effectively  determining  which  weapons  concepts  to  explore  further,  how  to  appro¬ 
priately  fit  a  set  of  aircraft  with  these  weapon  types,  and  how  to  effectively  employ 
them  within  a  given  scenario.  Lastly,  making  use  of  distributed  computing  as  well 
as  high-powered  computing  resources  should  be  investigated  to  assist  with  real-time 
decision  making. 


131 


Appendix  A.  Data  Tables  and  additional  figures 
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Value  Comparison  lor  W  =  T  =  5,  Problems  11-20  Value  Comparison  for  W  =  5,  T  =  10,  Problems  11-20 


(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  11-20 


Problem  Number 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  11-20 


(c)  W  =  10,  T  =  5 


(d)  W  =  10,  T  =  10 


Figure  15. 


Results  for  small  sized  experiments  at  varying  W  &  T 
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Value  Comparison  for  W  =  T  =  5,  Problems  21-30  Value  Comparison  for  W  =  5,  T  =  10,  Problems  21-30 


(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  21-30 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  21-30 


i 


i 


■  CW  Heur 
I ADP 
|  MMR  sim 


" 


21  22  23 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  16.  Results  for  small  sized  experiments  at  varying  W  &;  T 
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Value  Comparison  lor  W  =  T  =  5,  Problems  31-40  Value  Comparison  for  W  =  5,  T  =  10,  Problems  31-40 


(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  31-40  Value  Comparison  for  W  =  10,  T  =  10,  Problems  31-40 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  17.  Results  for  small  sized  experiments  at  varying  W  &  T 
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Value  Comparison  lor  W  =  T  =  5,  Problems  41-50  Value  Comparison  for  W  =  5,  T  =  10,  Problems  41-50 


(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  41-50 


Value  Comparison  for  W  =  1 0,  T  =  1 0,  Problems  41  -50 


■  CW  Heur 
I ADP 
|  MMR  sim 


i 


1 


i 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  18.  Results  for  small  sized  experiments  at  varying  W  &;  T 
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Value  Comparison  for  W  =  T  =  5,  Problems  51-60  Value  Comparison  for  W  =  5,  T  =  10,  Problems  51-60 


(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  51-60 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  51-60 


■  CW  Heur 
I ADP 
|  MMR  sim 


i 


^  i 
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51  52  53  54  55  56  57  58  59  60 

Problem  Number 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  19.  Results  for  small  sized  experiments  at  varying  W  &;  T 
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Value  Comparison  lor  W  =  T  =  5,  Problems  61-70  Value  Comparison  for  W  =  5,  T  =  10,  Problems  61-70 


(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  61-70  Value  Comparison  for  W  =  10,  T  =  10,  Problems  61-70 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  20.  Results  for  small  sized  experiments  at  varying  W  &  T 
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Value  Comparison  for  W  =  T  =  5,  Problems  71 -8C 


Value  Comparison  for  W  =  5,  T  =  10,  Problems  71 -8C 


■  CW  Heur 
3ADP 

■  MMRsim 


(a)  W  =  5,  T  =  5 

Value  Comparison  for  W  =  10,  T  =  5,  Problems  71 -8C 


(b)  W  =  5,  T  =  10 

Value  Comparison  for  W  =  10,  T  =  10,  Problems  71-80 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  21.  Results  for  small  sized  experiments  at  varying  W  &;  T 
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Value  Comparison  lor  W  =  T  =  5,  Problems  81  -90  Value  Comparison  for  W  =  5,  T  =  1 0,  Problems  81  -90 


(a)  W  =  5,  T  =  5 


(b)  W  =  5,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  5,  Problems  81 -9C 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  8' 


i 


i 


■  CW  Heur 
I ADP 

|  MMR  sim  ' 


i 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  22.  Results  for  small  sized  experiments  at  varying  W  &;  T 
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Value  Comparison  for  W  =  T  =  5,  Problems  1-10  Value  Comparison  for  W  =  5,  T  =  1 0,  Problems  1-10 


(a)  W  =  5,  T  =  5 

Value  Comparison  for  W  =  10,  T  =  5,  Problems  1-10 


■  CW  Heur 
I ADP 
|  MMR  sim 


ill 


ffl 


E  5 


i 


Problem  Number 


(b)  W  =  5,  T  =  10 

Value  Comparison  for  W  =  10,  T  =  10,  Problems  1-10 


Problem  Number 


(c)  W  =  10,  T  =  5  (d)  W  =  10,  T  =  10 

Figure  23.  Results  for  small  sized  experiments  at  varying  W  &  T 
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Value  Comparison  for  W  =  1 0,  T  =  20,  Problems  1 1  -20  Value  Comparison  for  W  =  20,  T  =  1 0,  Problems  1 1  -20 


(a)  W  =  10,  T  =  20  (b)  W  =  20,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  11-20 


(c)  W  =  20,  T  =  20 

Figure  24.  Results  for  medium  sized  experiments  at  varying  W  &:  T 
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Value  Comparison  for  W  =  1 0,  T  =  20,  Problems  21  -30  Value  Comparison  for  W  =  20,  T  =  1 0,  Problems  21  -30 


(a)  W  =  10,  T  =  20  (b)  W  =  20,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  21-30 


(c)  W  =  20,  T  =  20 

Figure  25.  Results  for  medium  sized  experiments  at  varying  W  &:  T 
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Value  Comparison  for  W  =  10,  T  =  20,  Problems  31-40  Value  Comparison  for  W  =  20,  T  =  10,  Problems  31-40 


(a)  W  =  10,  T  =  20 


(b)  W  =  20,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  31-40 


I CW  Heur 
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ii 
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>  60 

40 

20 

1 

31  32  33  34  35  36  37  38  39  40 

Problem  Number 


(c)  W  =  20,  T  =  20 

Figure  26.  Results  for  medium  sized  experiments  at  varying  W  &:  T 
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Value  Comparison  for  W  =  1 0,  T  =  20,  Problems  41  -50  Value  Comparison  for  W  =  20,  T  =  1 0,  Problems  41  -50 


(a)  W  =  10,  T  =  20 


(b)  W  =  20,  T  =  10 


Value  Comparison  for  W  =  10,  T  =  10,  Problems  41-50 


(c)  W  =  20,  T  =  20 


Figure  27.  Results  for  medium  sized  experiments  at  varying  W  &:  T 
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Value  Comparison  for  W  =  100,  T  =  100,  Problems  11-20 


(a)  W  =  100,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  100,  Problems  11-20 


(b)  W  =  200,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  200,  Problems  11-20 


(c)  W  =  200,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  100,  Problems  1 1  -20 


(d)  W  =  400,  T  =  100 


Value  Comparison  for  W  =  400,  T  =  200,  Problems  11-20 


(e)  W  =  400,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  400,  Problems  11-20 


(f)  W  =  400,  T  =  400 


Figure  28. 
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Value  Comparison  for  W  =  100,  T  =  100,  Problems  21-30 


(a)  W  =  100,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  100,  Problems  21-30 


(b)  W  =  200,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  200,  Problems  21-30 


i  i 


(c)  W  =  200,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  100,  Problems  21  -30 


(d)  W  =  400,  T  =  100 


Value  Comparison  for  W  =  400,  T  =  200,  Problems  21-30 


(e)  W  =  400,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  400,  Problems  21-30 


(f)  W  =  400,  T  =  400 


Figure  29. 
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Value  Comparison  for  W  =  100,  T  =  100,  Problems  31-40 


(a)  W  =  100,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  100,  Problems  31-40 


(b)  W  =  200,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  200,  Problems  31-40 


i  i 
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(c)  W  =  200,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  100,  Problems  31-40 


(d)  W  =  400,  T  =  100 


Value  Comparison  for  W  =  400,  T  =  200,  Problems  31-40 


(e)  W  =  400,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  400,  Problems  31-40 


(f)  W  =  400,  T  =  400 


Figure  30. 
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Value  Comparison  for  W  =  100,  T  =  100,  Problems  41-50 


(a)  W  =  100,  T  =  100 


Value  Comparison  for  W  =  200,  T  =  100,  Problems  41-50 


(b)  W  =  200,  T  =  100 


Problem  Number 


(c)  W  =  200,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  100,  Problems  41-50 


(d)  W  =  400,  T  =  100 


Value  Comparison  for  W  =  400,  T  =  200,  Problems  41-50 


(e)  W  =  400,  T  =  200 


Value  Comparison  for  W  =  400,  T  =  400,  Problems  41-50 


(f)  W  =  400,  T  =  400 


Figure  31. 
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